Weitong Zhang
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces Q-ALIGN DT, a novel framework that improves conditioned sequence models by enforcing alignment between the input return-to-go (RTG) signal and the output policy's expected Q-value, leading to superior policy controllability and performance.
LARK introduces a novel learnability-grounded approach for selecting reasoning trajectories, significantly improving the efficiency of reasoning distillation by prioritizing trajectories that the student model can learn from.
Papers
LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation
Tianrun Yu, Kaixiang Zhao, Chih-Chun Chen, Amanda Hughes +4 more
LARK introduces a novel learnability-grounded approach for selecting reasoning trajectories, significantly improving the efficiency of reasoning distillation by prioritizing trajectories that the stud…