Peer-reviewed paper
Mugi: Value Level Parallelism for Efficient LLMs
Mugi uses value-level parallelism to restructure nonlinear LLM operations and small-batch GEMMs. The paper shows that the approach can raise throughput and efficiency by turning value computation into a more parallel execution pattern.
Abstract Summary
Mugi uses value-level parallelism to restructure nonlinear LLM operations and small-batch GEMMs. The paper shows that the approach can raise throughput and efficiency by turning value computation into a more parallel execution pattern.
Research Context
This paper contributes to my research program in LLMs, computer architecture, ASPLOS 2026. It is part of the broader work on efficient ML systems, hardware-software co-design, and deployment-aware computer architecture.