Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning | ArxivCSExplorer