Soeun Kim
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1NLP×1ML×1
Frequent co-authors
Research Timeline
2026
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
The paper introduces REFT, a novel method that diversifies rollouts by sampling the first token after the reasoning marker, significantly improving performance in Reinforcement Learning with Verifiable Rewards (RLVR) without altering the core RLVR pipeline.
Highlighted terms show continued research focus across papers