BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents | ArxivCSExplorer