Yongqiang Chen
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
ML×1AI×1NLP×1
Frequent co-authors
Research Timeline
2026
Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning
This paper introduces Numca and Hista, two novel techniques that significantly improve state value estimation for LLM reinforcement learning, addressing the instability of standard critic approaches.
Highlighted terms show continued research focus across papers
Papers
cs.LGcs.AIcs.CLRecentMay 28, 2026
Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning
Zizhe Chen, Jiqian Dong, Yizhou Tian, Garry Yang +3 more
This paper introduces Numca and Hista, two novel techniques that significantly improve state value estimation for LLM reinforcement learning, addressing the instability of standard critic approaches.
View →