Mukul Gagrani

profile.jpg

I am a staff research scientist at Qualcomm AI research. My research currently focuses on the efficiency of Large Language Models (LLMs), specifically developing cutting edge solutions for edge inference. Key areas of my recent research include:

Before shifting focus to LLM efficiency, I worked on Machine Learning for Combinatorial Optimization, where I developed neural architectures like Topoformer for computation graph scheduling in ML compilers. My PhD work focused on Reinforcement Learning and stochastic control, specifically Thompson sampling for unknown Markov Decision Processes and Linear Systems.

I obtained my PhD in Electrical & Computer Engineering from USC in 2020 under the supervision of Dr. Ashutosh Nayyar & Dr. Rahul Jain. I finished my undergrad in Electrical Engineering from IIT Kanpur in 2013.

selected publications

  1. fast_forward_paper.png
    Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity
    Aayush Gautam, Mukul Gagrani, Junyoung Park, Mingu Lee, Christopher Lott, and 1 more author
    PrePrint, 2026
  2. ICML
    VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs
    Raghavv Goel, Sudhanshu Agrawal, Mukul Gagrani, Junyoung Park, Yifan Zao, and 7 more authors
    ICML Workshop on Efficient Systems for Foundational Models, 2025
  3. caote_paper.png
    CAOTE: KV Cache Selection for LLMs via Attention Output Error-Based Token Eviction
    Raghavv Goel, Junyoung Park, Mukul Gagrani, Dalton Jones, Matthew Morse, and 3 more authors
    PrePrint, 2025
  4. CVPR
    multimodal_spd.png
    On Speculative Decoding for Multimodal Large Language Models
    Mukul Gagrani*, Raghavv Goel*, Wonseok Jeon, Junyoung Park, Mingu Lee, and 1 more author
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024
  5. ICLR
    rsd.png
    Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
    Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park, Mingu Lee, and 1 more author
    ICLR LLM Agents Workshop, 2024, 2024
  6. ICLR
    Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs
    Raghavv Goel, Mukul Gagrani, Wonseok Jeon, Junyoung Park, Mingu Lee, and 1 more author
    ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models, 2024
  7. ICLR
    dag_scheduling.png
    Neural DAG scheduling via one-shot priority sampling
    Wonseok Jeon*, Mukul Gagrani*, Burak Bartan, Weiliang Will Zeng, Harris Teague, and 2 more authors
    ICLR, 2022
  8. NeurIPS
    topoformer.png
    Neural topological ordering for computation graphs
    Mukul Gagrani*, Corrado Rainone*, Yang Yang, Harris Teague, Wonseok Jeon, and 5 more authors
    NeurIPS, 2022
  9. Posterior sampling-based reinforcement learning for control of unknown linear systems
    Yi Ouyang, Mukul Gagrani, and Rahul Jain
    IEEE Transactions on Automatic Control, 2019
  10. NeurIPS
    tsde_regret.png
    Learning unknown markov decision processes: A thompson sampling approach
    Yi Ouyang, Mukul Gagrani, Ashutosh Nayyar, and Rahul Jain
    NeurIPS, 2017