Yuxiao Yang
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1
Frequent co-authors
Research Timeline
2026
Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning
The paper introduces Q-ALIGN DT, a novel framework that improves conditioned sequence models by enforcing alignment between the input return-to-go (RTG) signal and the output policy's expected Q-value, leading to superior policy controllability and performance.
Highlighted terms show continued research focus across papers