Zefan Yu

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×1

Frequent co-authors

Jiahao Huang1×

Fei Cheng1×

Junfeng Jiang1×

Akiko Aizawa1×

Research Timeline

2026

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate failure diagnosis and generalizing learned lessons.

Highlighted terms show continued research focus across papers

Papers

cs.AIRecentMay 28, 2026

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more

View →