~ similar to 2606.01066· 20 results
Weiyang Guo, Zesheng Shi, Zeen Zhu, Yuan Zhou +2 more
This paper introduces a novel backdoor attack (ACB) against Reinforcement Learning with Verifiable Rewards (RLVR), demonstrating that poisoning the training data can implant a backdoor that significan…
The paper demonstrates that using Reinforcement Learning from Verifiable Rewards (RLVR) significantly improves small language models' functional correctness in code generation, particularly when combi…
Ya-Qi Yu, Hao Wang, Fangyu Hong, Xiangyang Qu +14 more
The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robu…
The paper introduces REFT, a novel method that diversifies rollouts by sampling the first token after the reasoning marker, significantly improving performance in Reinforcement Learning with Verifiabl…
Jiasheng Zheng, Boxi Cao, Boxi Yu, Yuzhong Zhang +5 more
The paper introduces Atomic Decomposition and Recombination (ADR), a novel framework that generates genuinely novel and challenging verifiable code tasks, significantly improving the scalability of Re…
Matthias Cosler, Cas Cremers, Bernd Finkbeiner, Mohamed Ghanem +1 more
The paper introduces a reinforcement learning framework, inspired by AlphaZero, to automate and improve the proof search process within the Tamarin protocol analysis tool, resulting in shorter and mor…
The paper introduces Cross-Model Entropy (CME), a novel label-free reward signal that uses an independent verifier model to assess the quality of a generator's output, significantly improving LLM perf…
Ying Li, Hongbo Wen, Yanju Chen, Hanzhi Liu +2 more
The paper introduces Sefz, a semantic fuzzing framework that automatically discovers specification violations in LLM agent skills, finding a significant number of previously unknown exploitable guardr…
This paper proposes a new imitation learning algorithm called DistIL that uses distributional feedback to improve policy improvement and regret guarantees.
Bin Chen, Xinye Liao, Yiming Liu, Xin Liao +1 more
The paper proposes Credit-Attenuated Privileged Feedback (CAPF), a training-time mechanism that uses verifier-side information to guide LLM search agents, significantly improving their performance on…
Jinhe Bi, Aniri, Minglai Yang, Xingcheng Zhou +8 more
EchoRL proposes a lightweight module to exploit valuable learning signals from advantage-degenerated rollouts in Reinforcement Learning with Verifiable Rewards (RLVR), significantly improving LLM post…
The paper introduces a logic-driven framework using a neural certificate function to rigorously evaluate and benchmark the generalization capabilities of reinforcement learning algorithms on unseen ta…
IRDS introduces a novel data selection method that uses a verifier-coupled sparse autoencoder framework to efficiently select high-quality Reinforcement Learning with Verifiable Rewards (RLVR) trainin…
The paper proposes agentic fuzzing, a novel bug-finding approach where deep agents perform direct reasoning based on historical bugs to discover logic bugs in mature codebases.
Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung +2 more
The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to signifi…
Jiacheng Liang, Yao Ma, Tharindu Kumarage, Satyapriya Krishna +4 more
ARES is a novel framework that systematically discovers and mitigates dual vulnerabilities in RLHF systems by simultaneously testing the core LLM and its Reward Model (RM) using structured adversarial…
The paper proposes a two-stage post-training pipeline to create a small, local LLM agent (PrivEsc-LLM) capable of performing Linux privilege escalation, achieving high success rates while drastically…
SDLLMFuzz is a novel dynamic-static framework that combines LLM-based structure-aware input generation with semantic feedback from crash analysis to significantly improve vulnerability discovery in st…
Yue Cheng, Jiajun Zhang, Xiaohui Gao, Weiwei Xing +2 more
This paper investigates the non-monotonic role of sample difficulty in Reinforcement Learning with Verifiable Reward (RLVR), finding that medium-difficulty problems provide the most balanced and benef…