Xing Shi
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Canonical-Context On-Policy Distillation (CCOPD) to improve multi-turn language model performance by mitigating 'self-anchored drift,' ensuring consistent answers regardless of whether the evidence is presented in a single prompt or gradually across multiple turns.
The paper proposes EAPO, a framework that enables agentic models to learn when to forgo using external tools, thereby mitigating tool abuse while maintaining high reasoning accuracy.
Papers
Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning
Liuji Chen, Dianxing Tang, Xing Shi, Dingshuo Chen +3 more
The paper proposes EAPO, a framework that enables agentic models to learn when to forgo using external tools, thereby mitigating tool abuse while maintaining high reasoning accuracy.