Xuheng Li

PhD Candidate · Department of Computer Science · University of California, Los Angeles

Email: [FIRST] [dot] [LAST] [at] cs [dot] ucla [dot] edu

CV Google Scholar X GitHub

I am a Ph.D. student of the AGI Lab in the Department of Computer Science at the University of California, Los Angeles, advised by Prof. Quanquan Gu. I received my B.Sc. at the School of Mathematical Sciences at Peking University.

My research focuses on optimization and RL applied to the pre-training and post-training of LLMs. I am also interested in sampling based-methods, including the diffusion models and their applications. I am fascinated with in how the dynamics of high-dimensional models are shaped by the low-dimensional structure of the data and training algorithms.

News

Apr 2026
Heading to ICLR 2026 in Rio de Janeiro! I will be presenting two works:
- Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling
  Main conference, poster session 2 · Poster P4-#4306 · Afternoon of April 23
- Dimension-Independent Convergence of Underdamped Langevin Monte Carlo in KL Divergence
  DeLTa 2026 Workshop
Mar 2026

We just launched EurekaClaw, an AI research agent that captures your Eureka moments!
EurekaClaw March 22, 2026 Just dropped: watch EurekaClaw go from zero → full research paper. Then it goes fully autonomous: literature → ideas → Show more

Publications

* denotes equal contribution.

Loading publications…

Research

Optimization in High Dimensions

Modern machine learning models are trained on high-dimensional loss landscapes whose behavior is far from well understood. I study how stochastic optimization algorithms interact with the intrinsic low-dimensional structure of data.

Sampling and Diffusion Models

Score-based generative models and Markov chain Monte Carlo samplers share a deep connection through stochastic differential equations. I work on the theoretical foundations of sampling algorithms, and on applying diffusion models to structured domains such as mixed-type electronic health records.

RL in Post-Training and Reasoning of LLMs

Reinforcement learning from human feedback and inference-time scaling are central to aligning and eliciting reasoning in large language models. I develop principled algorithms and statistical frameworks for contextual bandits and inference strategies.

A Little More About Me

Beyond research, hiking and stargazing are two of my favorite activities in life. Trying to make the most of a finite life in the vastness of nature and the universe.

"Look again at that dot. That's here. That's home. That's us."
Carl Sagan