Peer-reviewed paper
TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition
TaxBreak decomposes host-visible orchestration overhead in LLM inference into framework translation, CUDA library translation, and kernel launch-path costs. The work introduces the Host-Device Balance Index to make host and device bottlenecks easier to compare and optimize.
Abstract Summary
TaxBreak decomposes host-visible orchestration overhead in LLM inference into framework translation, CUDA library translation, and kernel launch-path costs. The work introduces the Host-Device Balance Index to make host and device bottlenecks easier to compare and optimize.
Research Context
This paper contributes to my research program in LLM inference, performance analysis, overhead decomposition. It is part of the broader work on efficient ML systems, hardware-software co-design, and deployment-aware computer architecture.