Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Zihang Li

Zihang Li

1 indexed paper

Recent (6 mo)
1
With code
0
Influential cites
0
Benchmarked
0

Publications per year

1
26

Top categories

ML×1AI×1

Frequent co-authors

Rui Zhou1×
Yingcheng Shi1×
Wenhan Yu1×
Zhewen Tan1×
Zixiang Liu1×
Zeming Li1×

Research Timeline

2026
ESPO: Early-Stopping Proximal Policy Optimization

ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning benchmarks while reducing computational cost.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIRecentMay 28, 2026

ESPO: Early-Stopping Proximal Policy Optimization

Zihang Li, Rui Zhou, Yingcheng Shi, Wenhan Yu +7 more

ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning b…

View →