Zhenting Qi

3 indexed papers

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×2Architecture×1AI×1Vision×1

Frequent co-authors

Yilun Du2×

Chenyu Wang1×

Zishen Wan1×

Jeffrey Ma1×

Shvetank Prakash1×

Haebin Do1×

Research Timeline

2026

Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing

The paper introduces Dr. DocBench, a difficulty-aware, comprehensive benchmark designed to rigorously test expert-level and challenging document parsing capabilities for VLMs, demonstrating that current state-of-the-art models fail on complex, domain-specific structures.

On the Generalization Gap in Self-Evolving Language Model Reasoning

The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle supervision.

ArchEval: Measuring AI Agents as Computer Architects

This paper introduces ArchEval, a benchmark and platform for evaluating LLM agents on computer architecture design and optimization.

Highlighted terms show continued research focus across papers

Papers

cs.AREmpiricalRecentJul 3, 2026

ArchEval: Measuring AI Agents as Computer Architects

Chenyu Wang, Zishen Wan, Jeffrey Ma, Shvetank Prakash +7 more

This paper introduces ArchEval, a benchmark and platform for evaluating LLM agents on computer architecture design and optimization.

View →

cs.CLcs.AIcs.CVRecentMay 31, 2026