Zhihao Zhang

9219 GHC,

4902 Forbes Ave,

Pittsburgh, PA 15213

I am a third-year Ph.D. student at Computer Science Department of Carnegie Mellon University. I’m a member of the CMU Catalyst research group, and fortunate to be advised by Prof. Zhihao Jia.

Prior to joining CMU, I received my Master degree at the Robotics Institute of Carnegie Mellon University and B.Sc in Computer Science at Renmin University of China, where I have been advised Prof. Changliu Liu and Prof. Qin Jin.

Research interests: Connecting ML with compute.

selected publications

ICML

Accelerating retrieval-augmented language model serving with speculation

Zhihao Zhang, Alan Zhu , Lijie Yang , and 4 more authors

To appear at ICML 2024, 2024

arXiv Bib

@article{zhang2024accelerating,
  title = {Accelerating retrieval-augmented language model serving with speculation},
  author = {Zhang, Zhihao and Zhu, Alan and Yang, Lijie and Xu, Yihua and Li, Lanting and Phothilimthana, Phitchaya Mangpo and Jia, Zhihao},
  journal = {To appear at ICML 2024},
  year = {2024},
}

ASPLOS

Specinfer: Accelerating generative llm serving with speculative inference and token tree verification

Xupeng Miao ^* , Gabriele Oliaro ^* , Zhihao Zhang ^* , and 7 more authors

To appear at ASPLOS 2024, 2023

arXiv Bib

@article{miao2023specinfer,
  title = {Specinfer: Accelerating generative llm serving with speculative inference and token tree verification},
  author = {Miao, Xupeng and Oliaro, Gabriele and Zhang, Zhihao and Cheng, Xinhao and Wang, Zeyu and Wong, Rae Ying Yee and Chen, Zhuoming and Arfeen, Daiyaan and Abhyankar, Reyna and Jia, Zhihao},
  journal = {To appear at ASPLOS 2024},
  year = {2023},
  cofirst = {4}
}

ICLR

GradSign: Model Performance Inference with Theoretical Insights

Zhihao Zhang, and Zhihao Jia

In International Conference on Learning Representations , 2021

Bib PDF

@inproceedings{zhang2021gradsign,
  title = {GradSign: Model Performance Inference with Theoretical Insights},
  author = {Zhang, Zhihao and Jia, Zhihao},
  booktitle = {International Conference on Learning Representations},
  year = {2021},
}

NeurIPS

Communication Bounds for the Distributed Experts Problem

Zhihao Jia , Qi Pang , Trung Tran , and 3 more authors (in alphabetic order)

In The Thirty-eighth Annual Conference on Neural Information Processing Systems , 2024
ICLR

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Lijie Yang ^* , Zhihao Zhang ^* , Zhuofu Chen , and 2 more authors

2024