Heming Zou
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1
Frequent co-authors
Research Timeline
2026
RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning
The paper introduces Group Prioritized Off-Policy Optimization (POPO), a novel framework that efficiently accelerates RL finetuning for LLM reasoning by leveraging effective off-policy training batches without requiring costly additional data rollouts.
Highlighted terms show continued research focus across papers
Papers
cs.LGcs.AIRecentMay 31, 2026
RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning
Yixiu Mao, Yun Qu, Qi Wang, Heming Zou +1 more
The paper introduces Group Prioritized Off-Policy Optimization (POPO), a novel framework that efficiently accelerates RL finetuning for LLM reasoning by leveraging effective off-policy training batche…
View →