Yuxuan Tian
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1
Frequent co-authors
Research Timeline
2026
Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows
The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be reported at the model-harness configuration level.
Highlighted terms show continued research focus across papers
Papers
cs.AIRecentMay 27, 2026
Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows
Yilun Yao, Xinyu Tan, Chao-Hsuan Liu, Yaoming Li +8 more
The paper introduces Harness-Bench, a diagnostic benchmark that measures how different system 'harnesses' affect LLM agent performance in realistic workflows, showing that agent capability must be rep…
View →