Xuewei Yang
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
NLP×1ML×1
Frequent co-authors
Research Timeline
2026
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in reinforcement learning.
Highlighted terms show continued research focus across papers
Papers
cs.CLcs.LGRecentMay 30, 2026
Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
Xuewei Yang, Jiachen Yu, Jie Wu, Shaoning Sun +2 more
The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in r…
View →