Abstract Summary

TaxBreak decomposes host-visible orchestration overhead in LLM inference into framework translation, CUDA library translation, and kernel launch-path costs. The work introduces the Host-Device Balance Index to make host and device bottlenecks easier to compare and optimize.

Research Context

This paper contributes to my research program in TaxBreak, LLM inference, arXiv. It is part of the broader work on efficient ML systems, hardware-software co-design, and deployment-aware computer architecture.

TaxBreakLLM inferencearXiv

Related Papers