Wenhan Yu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be reported at the model-harness configuration level.
ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning benchmarks while reducing computational cost.
Papers
ESPO: Early-Stopping Proximal Policy Optimization
Zihang Li, Rui Zhou, Yingcheng Shi, Wenhan Yu +7 more
ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning b…