Yue Cheng
2 indexed papers
Publications per year
Top categories
Frequent co-authors
Research Timeline
The paper introduces a Hybrid Utility Minimum Bayes Risk (HUMBR) framework to significantly reduce hallucinations in high-stakes enterprise AI workflows, outperforming standard consistency methods.
This paper investigates the non-monotonic role of sample difficulty in Reinforcement Learning with Verifiable Reward (RLVR), finding that medium-difficulty problems provide the most balanced and beneficial learning signals for LLMs.
Papers
Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs
Yue Cheng, Jiajun Zhang, Xiaohui Gao, Weiwei Xing +2 more
This paper investigates the non-monotonic role of sample difficulty in Reinforcement Learning with Verifiable Reward (RLVR), finding that medium-difficulty problems provide the most balanced and benef…