Xiaohan Wang
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
This paper analyzes the limitations of Counterfactual Knowledge Training (CFT) for LLM unlearning, identifying knowledge conflict and hallucination spillover as major pitfalls that hinder its effectiveness.
ZipRL introduces an adaptive context compression framework that significantly improves the performance and efficiency of LLMs in complex, multi-turn agent tasks by combining multi-granularity compression with Hindsight Response Replay.
This paper proposes two horizon-control strategies, Progressive OPD (POPD) and Truncated OPD (TOPD), demonstrating that full rollouts are often unnecessary for On-Policy Distillation, leading to significant improvements in training efficiency.
Papers
Are Full Rollouts Necessary for On-Policy Distillation?
Yaocheng Zhang, Jiajun Chai, Yuqian Fu, Songjun Tu +6 more
This paper proposes two horizon-control strategies, Progressive OPD (POPD) and Truncated OPD (TOPD), demonstrating that full rollouts are often unnecessary for On-Policy Distillation, leading to signi…