Liwen Hu
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1
Frequent co-authors
Research Timeline
2026
CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO
The paper proposes CAST, an answer-free self-distillation method that enhances Group Relative Policy Optimization (GRPO) for verifiable rewards, allowing token-level advantage signals even when all sampled trajectories are uniformly correct or incorrect.
Highlighted terms show continued research focus across papers
Papers
cs.AIRecentMay 29, 2026
CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO
Yang Li, Gongle Xue, Yijia Guo, Yuheng Yuan +2 more
The paper proposes CAST, an answer-free self-distillation method that enhances Group Relative Policy Optimization (GRPO) for verifiable rewards, allowing token-level advantage signals even when all sa…
View →