Preprint

TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition

TaxBreak decomposes host-visible orchestration overhead in LLM inference into framework translation, CUDA library translation, and kernel launch-path costs. The work introduces the Host-Device Balance Index to make host and device bottlenecks easier to compare and optimize.

Authors: P. Vellaisamy, S. Tripathi, V. Natarajan, S.S. Thenarasu, R.D.S. Blanton, J.P. Shen

Venue: arXiv preprint, March 2026

Note: Preprint

Back to Publications arXiv:2603.12465

Abstract Summary

Research Context

This paper contributes to my research program in TaxBreak, LLM inference, arXiv. It is part of the broader work on efficient ML systems, hardware-software co-design, and deployment-aware computer architecture.

TaxBreakLLM inferencearXiv

Abstract Summary

Research Context

Related Papers