PhD Candidate · Carnegie Mellon University

Prabhu Vellaisamy

I am a PhD candidate in Electrical & Computer Engineering at CMU, co-advised by Prof. John Paul Shen and Prof. Shawn Blanton. My research spans LLM inference optimization on CPU-GPU coupled architectures, energy-efficient deep learning accelerators, and neuromorphic computing with Temporal Neural Networks. I received the 2023 Qualcomm Innovation Fellowship and the CIT Dean's Fellowship.

Expected graduation: December 2026. Incoming AI Research Scientist Intern at Samsung Semiconductor (Jun 2026 – Sep 2026) and Silicon Solution Engineering Intern at NVIDIA (Mar 2026 – Jun 2026), both offer accepted.

Current focus

LLM systems, inference bottlenecks, and accelerator-aware optimization.

Research footprint

CMU NCAL, CMU ACTL, UCF UNARY, and NEXUS collaborations across 4 research groups.

Teaching & advising

10 teaching semesters and mentoring support in architecture and VLSI workflows.

Download CV Research Publications

pvellais@andrew.cmu.edu · Google Scholar · GitHub · LinkedIn

Publications

Workshop Papers

Conference Talks

Teaching Semesters

Research Interests

My work sits at the intersection of systems, architecture, and physical implementation: from LLM inference behavior on coupled CPU-GPU platforms to low-power accelerator design and neuromorphic hardware generation.

LLM Inference Optimization

Profiling and optimizing LLM workloads on CPU-GPU coupled architectures (H100, GH200). KV cache efficiency, batching strategies, kernel-level bottleneck decomposition.

Deep Learning Accelerator Design

Custom GEMM units, convolution cores, and MAC architectures targeting edge AI — leveraging unary/binary hybrid arithmetic for area-power-efficiency trade-offs.

Neuromorphic Computing

Temporal Neural Networks (TNNs), automated RTL-to-GDSII design frameworks, and custom PDK development for neuromorphic sensory processing.

VLSI / ASIC Design

Physical design, floorplanning, clock tree synthesis, DRC/LVS signoff on TSMC N5/N7 and ASAP7 PDK. Hardware-software co-design for AI workloads.

Selected Publications

Representative papers across LLM systems, accelerator architecture, and temporal neuromorphic hardware.

TaxBreak: Unmasking the Hidden Costs of LLM Inference Through Overhead Decomposition

P. Vellaisamy, Y. Deng, S. Chakraborty, M. Scherer, S. Sury, J.P. Shen

IEEE ISPASS 2026 Accepted

A decomposition of LLM inference overheads that isolates non-matmul costs, exposes where end-to-end latency is lost, and helps redirect optimization effort toward the bottlenecks large-model deployments actually pay for.

Mugi: Value Level Parallelism for Efficient LLMs

D. Price, P. Vellaisamy, J.P. Shen, D. Wu

ACM ASPLOS 2026 Systems

Generalizes value-level parallelism (VLP) for nonlinear LLM operations and small-batch GEMMs. Up to 45× throughput and 668× energy efficiency for softmax; 2.07× LLM throughput and 3.11× energy efficiency; 1.45× reduction in operational carbon.

Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures

P. Vellaisamy, T. Labonte, S. Chakraborty, M. Turner, S. Sury, J.P. Shen

IEEE ISPASS 2025 Invited Talk at Jülich Supercomputing Center

Characterizes prefill/decode bottlenecks on H100 vs GH200: GH200 incurs 2.8× higher prefill latency and 4× larger CPU-bounded region. Samsung-funded ($150K+).

Tempus Core: Area-Power Efficient Temporal-Unary Convolution Core for Low-Precision Edge DLAs

P. Vellaisamy, H. Nair, T. Kang, Y. Ni, H. Fan, B. Qi, H.F. Hung, J. Chen, R.D.S. Blanton, J.P. Shen

IEEE DATE 2025

INT8 temporal-unary convolution core for NVDLA on 7nm: 53% area reduction, 44% power savings, 5× iso-area throughput improvement.

Catwalk: Unary Top-K for Efficient Ramp-No-Leak Neuron Design for Temporal Neural Networks

D. Lister, P. Vellaisamy, J.P. Shen, D. Wu

IEEE ISVLSI 2025 Best Paper Award

Introduces a unary top-k design for Temporal Neural Networks that improves ramp-no-leak neuron efficiency and earned the Amar Mukherjee Best Paper Award at ISVLSI 2025.

View all 12 publications + 5 workshop papers →

Education

Sep 2021 – Dec 2026 (Expected)

Doctor of Philosophy, Electrical & Computer Engineering

Carnegie Mellon University — Pittsburgh, PA

Advisors: Prof. J.P. Shen & Prof. Shawn Blanton · CMU NCAL, CMU ACTL, UCF UNARY Research Groups.
CIT Dean's Fellowship. 2023 Qualcomm Innovation Fellowship.

Jan 2020 – May 2021

Master of Science, Electrical & Computer Engineering

Carnegie Mellon University — Pittsburgh, PA

Jun 2014 – Jul 2018

Bachelor of Technology, Electrical & Electronics Engineering

SRM Institute of Science and Technology — Chennai, India