Mukul Gagrani
I am a staff research scientist at Qualcomm AI research. My research currently focuses on the efficiency of Large Language Models (LLMs), specifically developing cutting edge solutions for edge inference. Key areas of my recent research include:
-
Speculative Decoding (SPD) - Pioneered application of SPD to multimodal models Spotlight paper CVPR workshop 2024, developed Tree based SPD, Vocabulary trimming for efficient drafting
-
KV cache compression and FFN sparsity for efficient LLM inference
Before shifting focus to LLM efficiency, I worked on Machine Learning for Combinatorial Optimization, where I developed neural architectures like Topoformer for computation graph scheduling in ML compilers. My PhD work focused on Reinforcement Learning and stochastic control, specifically Thompson sampling for unknown Markov Decision Processes and Linear Systems.
I obtained my PhD in Electrical & Computer Engineering from USC in 2020 under the supervision of Dr. Ashutosh Nayyar & Dr. Rahul Jain. I finished my undergrad in Electrical Engineering from IIT Kanpur in 2013.