Zefan Yu
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1
Frequent co-authors
Research Timeline
2026
BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate failure diagnosis and generalizing learned lessons.
Highlighted terms show continued research focus across papers
Papers
cs.AIRecentMay 28, 2026
BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
View →