Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:
ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Home/Authors/Weitong Zhang

Weitong Zhang

2 indexed papers

Recent (6 mo)
2
With code
0
Influential cites
0
Benchmarked
0

Publications per year

2
26

Top categories

ML×2AI×2

Frequent co-authors

Tianrun Yu1×
Kaixiang Zhao1×
Chih-Chun Chen1×
Amanda Hughes1×
Taylor W. Killian1×
Fenglong Ma1×

Research Timeline

2026
Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

The paper introduces Q-ALIGN DT, a novel framework that improves conditioned sequence models by enforcing alignment between the input return-to-go (RTG) signal and the output policy's expected Q-value, leading to superior policy controllability and performance.

LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation

LARK introduces a novel learnability-grounded approach for selecting reasoning trajectories, significantly improving the efficiency of reasoning distillation by prioritizing trajectories that the student model can learn from.

Highlighted terms show continued research focus across papers

Papers

cs.LGcs.AIRecentMay 28, 2026

LARK: Learnability-Grounded Trajectory Selection for Efficient Reasoning Distillation

Tianrun Yu, Kaixiang Zhao, Chih-Chun Chen, Amanda Hughes +4 more

LARK introduces a novel learnability-grounded approach for selecting reasoning trajectories, significantly improving the efficiency of reasoning distillation by prioritizing trajectories that the stud…

View →
cs.LGcs.AIRecentMay 27, 2026

Return-to-Go Is More Than a Number: Q-Guided Alignment for Return-Conditioned Supervised Learning

Yuxiao Yang, Weitong Zhang

The paper introduces Q-ALIGN DT, a novel framework that improves conditioned sequence models by enforcing alignment between the input return-to-go (RTG) signal and the output policy's expected Q-value…

View →