Xuewei Yang

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

NLP×1ML×1

Frequent co-authors

Jiachen Yu1×

Jie Wu1×

Shaoning Sun1×

Junjie Wang1×

Yujiu Yang1×

Research Timeline

2026

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

The paper introduces Temperature-Scaled On-Policy Self-Distillation (TS-OPSD), a novel method that internalizes temperature-based policy reheating into model parameters to combat entropy collapse in reinforcement learning.

Highlighted terms show continued research focus across papers

Papers

cs.CLcs.LGRecentMay 30, 2026

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning

Xuewei Yang, Jiachen Yu, Jie Wu, Shaoning Sun +2 more

View →