Abstract Summary

Mugi uses value-level parallelism to restructure nonlinear LLM operations and small-batch GEMMs. The paper shows that the approach can raise throughput and efficiency by turning value computation into a more parallel execution pattern.

Research Context

This paper contributes to my research program in LLMs, computer architecture, ASPLOS 2026. It is part of the broader work on efficient ML systems, hardware-software co-design, and deployment-aware computer architecture.

LLMscomputer architectureASPLOS 2026value-level parallelism

Related Papers