Yuqian Fu
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper demonstrates that the valence structure learned by modern LLMs aligns with human EEG emotional representations, but finds that further supervised alignment is ineffective due to a phenomenon called saturation regularity.
This paper proposes two horizon-control strategies, Progressive OPD (POPD) and Truncated OPD (TOPD), demonstrating that full rollouts are often unnecessary for On-Policy Distillation, leading to significant improvements in training efficiency.
Papers
Are Full Rollouts Necessary for On-Policy Distillation?
Yaocheng Zhang, Jiajun Chai, Yuqian Fu, Songjun Tu +6 more
This paper proposes two horizon-control strategies, Progressive OPD (POPD) and Truncated OPD (TOPD), demonstrating that full rollouts are often unnecessary for On-Policy Distillation, leading to signi…