He Zhu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an agent's reasoning fails.
The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.
Papers
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
Jiaming Wang, Ziteng Feng, Jiangtao Wu, Ruihao Li +7 more
The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an ag…