Yan Gao
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Trust Region On-Policy Distillation (TrOPD), a robust method that stabilizes the on-policy distillation of large language models by restricting training to regions where teacher supervision is reliable.
The paper proposes Deep Research as Rubric (DR-rubric), a novel evidence-driven framework that treats rubric construction itself as a research problem to generate fine-grained, scalable reward signals for open-ended reasoning tasks.
The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target localization from modification execution.
Papers
What to Format and How: A Benchmark and Workflow Approach for Document Formatting
Shihao Rao, Liang Li, Jiapeng Liu, Tong Lin +5 more
The paper introduces DocFormBench, a new benchmark for content-aware document formatting, and proposes DocFormFlow, a workflow that improves formatting accuracy and efficiency by decoupling target loc…