Education

Sep 2021 – Dec 2026 (Expected)
Doctor of Philosophy, Electrical & Computer Engineering
Carnegie Mellon University — Pittsburgh, PA
Co-Advisors: Prof. J.P. Shen & Prof. Shawn Blanton · CMU NCAL, CMU ACTL, UCF UNARY Research Groups
CIT Dean's Fellowship · 2023 Qualcomm Innovation Fellowship Winner
Jan 2020 – May 2021
Master of Science, Electrical & Computer Engineering
Carnegie Mellon University — Pittsburgh, PA
Jun 2014 – Jul 2018
Bachelor of Technology, Electrical & Electronics Engineering
SRM Institute of Science and Technology — Chennai, India

Research Interests

LLM inference optimization Deep learning accelerators Neuromorphic computing Unary computing VLSI / ASIC design Hardware-software co-design CPU-GPU coupled architectures Edge AI

Academic and Industry Experience

Jun 2026 – Sep 2026
Artificial Intelligence (AI) Research Scientist Intern Offer Accepted
Samsung Semiconductor Inc. — San Jose, CA
Mar 2026 – Jun 2026
Silicon Solution Engineering Intern Offer Accepted
NVIDIA Corporation — Santa Clara, CA
Jun 2024 – Aug 2024
AI Characterization & Tight Coupling Analysis Intern
Samsung Semiconductor Inc. — San Jose, CA
  • Built SKIP, a PyTorch Profiler framework that uncovered critical LLM inference bottlenecks, revealing GH200 suffers 2.8× higher prefill latency and 4× larger CPU-bounded regions vs. Intel x86+H100 and AMD x86+A100.
  • Spearheaded a 5-person CMU-Samsung research collaboration from concept to publication; first-authored paper accepted at ISPASS 2025.
Jun 2022 – Dec 2022
AI Architecture & Algorithm Intern Exemplary Performance Award
MediaTek USA Inc. — San Jose, CA (Full-time Jun–Sep; Part-time Remote Aug–Dec)
  • Developed TubGEMM (ISVLSI 2023) and OzMAC (VLSI-SoC 2024), compute units for edge AI that reduced power consumption by 40%+ while maintaining throughput on TSMC N5 process.

PhD Research Highlights

2021 – Present
Doctoral Researcher
CMU NCAL / CMU ACTL — Co-Advisors: Prof. J.P. Shen & Prof. Shawn Blanton
  • Collaborated with research teams across 4 research groups (CMUNCAL, CMU-ACTL, UCF-UNARY, NEXUS), delivering 12 peer-reviewed publications and mentoring graduate and undergraduate students.
  • Characterized performance bottlenecks in LLM inference on GH200 through systematic profiling of model configurations; research funded by Samsung Semiconductor.
  • Created Tempus Core, an INT8 temporal-unary convolution accelerator that achieved 53% area reduction, 44% power savings, and 5× iso-area throughput improvement over the baseline NVDLA convolution core.
  • Created TNNGen, an automation framework that compiles PyTorch models to layout-ready netlists and was validated across 7 modalities.
  • Developed TNN7, a set of 9 macros for a 7nm PDK extension to ASAP7, reducing energy-delay product (EDP) by 45% against the baseline design.

Presentations

Fellowships, Awards, and Honors

Teaching Experience

Carnegie Mellon University — Department of Electrical and Computer Engineering

Course Role Semesters
18-340/640: Hardware Arithmetic for Machine Learning Teaching Instructor 4 semesters (approximately 50 students per semester)
18-743: Neuromorphic Computer Architecture & Processor Design Teaching Instructor 5 semesters (approximately 20 graduate students per semester)
18-740: Modern Computer Architecture Teaching Instructor 1 semester (approximately 100 students)

Technical Skills

Tools
vLLM TensorRT NVIDIA Nsight Systems Nsight Compute nvprof Synopsys Design Compiler Synopsys VCS Cadence Genus Cadence Xcelium Cadence Innovus AMD Vivado Intel Quartus Prime
Programming
Python PyTorch SystemVerilog Verilog C++ Tcl
Languages
English Hindi Tamil Japanese

Relevant Coursework

Large Language Models: Methods and Applications Neuromorphic Computer Architecture Modern Computer Architecture Introduction to Machine Learning Hardware Arithmetic for Machine Learning Introduction to Embedded Deep Learning Advanced Digital Integrated Circuit Design Applied Cryptography Fundamentals of Computational Biology

Professional Service

Professional Memberships and Honor Societies