Zhewen Tan

2 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×2ML×1

Frequent co-authors

Wenhan Yu2×

Tong Yang2×

Zihang Li1×

Rui Zhou1×

Yingcheng Shi1×

Zixiang Liu1×

Research Timeline

2026

Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows

The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be reported at the model-harness configuration level.

ESPO: Early-Stopping Proximal Policy Optimization

ESPO is a novel reinforcement learning algorithm that detects trajectory failure in large language models and terminates rollouts early, significantly improving performance on mathematical reasoning benchmarks while reducing computational cost.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIRecentMay 28, 2026

ESPO: Early-Stopping Proximal Policy Optimization

Zihang Li, Rui Zhou, Yingcheng Shi, Wenhan Yu +7 more

View →

cs.AIRecentMay 27, 2026