Alexey Gorbatovski
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1
Frequent co-authors
Research Timeline
2026
Trust-Region Behavior Blending for On-Policy Distillation
The paper introduces Trust-Region behavior Blending (TRB), a warmup method that improves on-policy distillation by replacing poor early student rollouts with teacher-aligned behavior policies, leading to state-of-the-art performance on math-reasoning tasks.
Highlighted terms show continued research focus across papers