Peer-reviewed paper

TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition

TaxBreak decomposes host-visible orchestration overhead in LLM inference into framework translation, CUDA library translation, and kernel launch-path costs. The work introduces the Host-Device Balance Index to make host and device bottlenecks easier to compare and optimize.

Authors: P. Vellaisamy, S. Tripathi, V. Natarajan, S.S. Thenarasu, R.D.S. Blanton, J.P. Shen

Venue: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2026 · Accepted · Scheduled for presentation on April 27, 2026

Note: Accepted paper

Back to Publications arXiv:2603.12465

Abstract Summary

Research Context

This paper contributes to my research program in LLM inference, performance analysis, overhead decomposition. It is part of the broader work on efficient ML systems, hardware-software co-design, and deployment-aware computer architecture.

LLM inferenceperformance analysisoverhead decompositionISPASS 2026

Abstract Summary

Research Context

Related Papers