Yu Lin
3 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper demonstrates that using on-policy distillation from a strong teacher model significantly improves the performance of compact Automatic Speech Recognition (ASR) models, achieving competitive results with a much smaller audio dataset compared to supervised fine-tuning.
The paper introduces Lookahead Group Reward (&) to combat Supervision Fidelity Decay (SFD) in on-policy distillation, significantly improving student model performance on long reasoning tasks.
The paper introduces Atomic Decomposition and Recombination (ADR), a novel framework that generates genuinely novel and challenging verifiable code tasks, significantly improving the scalability of Reinforcement Learning with Verifiable Rewards (RLVR) for LLMs.
Papers
Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation
Yanjiang Liu, Jie Lou, Xinyan Guan, Yuqiu Ji +6 more
The paper introduces Lookahead Group Reward (&) to combat Supervision Fidelity Decay (SFD) in on-policy distillation, significantly improving student model performance on long reasoning tasks.