Jaideep Ray

1 indexed paper

Recent (6 mo)

With code

Influential cites

Benchmarked

Publications per year

Top categories

AI×1

Research Timeline

2026

Before the Model Learns the Bug:Fuzzing RLVR Verifiers

The paper introduces a verifier-fuzzing framework to detect and analyze failure modes in Reinforcement Learning with Verifiable Rewards (RLVR) where bugs in the reward verifier can be exploited by the learning model.

Highlighted terms show continued research focus across papers

Papers

cs.AIRecentMay 31, 2026

Before the Model Learns the Bug:Fuzzing RLVR Verifiers

Jaideep Ray

View →