Jaideep Ray
1 indexed paper
Recent (6 mo)
1With code
0Influential cites
0Benchmarked
0Publications per year
126
Top categories
AI×1
Research Timeline
2026
Before the Model Learns the Bug:Fuzzing RLVR Verifiers
The paper introduces a verifier-fuzzing framework to detect and analyze failure modes in Reinforcement Learning with Verifiable Rewards (RLVR) where bugs in the reward verifier can be exploited by the learning model.
Highlighted terms show continued research focus across papers