Haoqing Wang
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1NLP×1
Frequent co-authors
Research Timeline
2026
Trust Region On-Policy Distillation
The paper introduces Trust Region On-Policy Distillation (TrOPD), a robust method that stabilizes the on-policy distillation of large language models by restricting training to regions where teacher supervision is reliable.
Highlighted terms show continued research focus across papers
Papers
cs.LGcs.CLRecentMay 31, 2026
Trust Region On-Policy Distillation
Xingrun Xing, Haoqing Wang, Boyan Gao, Ziheng Li +1 more
The paper introduces Trust Region On-Policy Distillation (TrOPD), a robust method that stabilizes the on-policy distillation of large language models by restricting training to regions where teacher s…
View →