~ similar to 2605.23643v1· 19 results
The paper introduces a systematic, executable taxonomy of security properties to bridge the gap between theoretical security definitions and their practical implementation in formal verification tools…
Weiyang Guo, Zesheng Shi, Zeen Zhu, Yuan Zhou +2 more
This paper introduces a novel backdoor attack (ACB) against Reinforcement Learning with Verifiable Rewards (RLVR), demonstrating that poisoning the training data can implant a backdoor that significan…
The paper proposes a two-stage post-training pipeline to create a small, local LLM agent (PrivEsc-LLM) capable of performing Linux privilege escalation, achieving high success rates while drastically…
The paper introduces a lightweight, sampling-based cryptographic protocol for verifiable AI inference that drastically reduces proving overhead from minutes to milliseconds by leveraging statistical p…
The paper introduces a logic-driven framework using a neural certificate function to rigorously evaluate and benchmark the generalization capabilities of reinforcement learning algorithms on unseen ta…
The paper introduces a comprehensive security framework, AgentRFC, to systematically analyze and test the security conformance of various AI agent protocols, identifying critical design gaps, especial…
The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…
Xiang Liu, Sa Song, Zhaowei Zhang, Huiying Lan +5 more
The paper introduces Agora, a domain-aware multi-agent framework that successfully detects deep, previously unknown logic bugs in complex consensus protocols, outperforming existing LLM-based analysis…
COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.
Ya-Qi Yu, Hao Wang, Fangyu Hong, Xiangyang Qu +14 more
The paper introduces $ ext{RLR}^3$, a novel framework that extends verifiable rewards in Reinforcement Learning to handle partially verifiable, multi-criteria vision-language tasks by integrating robu…
The paper proves that platform-deterministic inference is a necessary and sufficient condition for trustworthy AI, establishing that AI trust fundamentally relies on consistent arithmetic.
SentinelAgent introduces a formal framework, the Intent-Preserving Delegation Protocol (IPDP), to secure federal multi-agent AI systems by verifying complex delegation chains against seven properties,…
The paper introduces a novel shielding framework for Robust MDPs (RMDPs) that guarantees safety under worst-case transition probabilities, enabling safe reinforcement learning even when transition dyn…
The paper introduces a verifier-fuzzing framework to detect and analyze failure modes in Reinforcement Learning with Verifiable Rewards (RLVR) where bugs in the reward verifier can be exploited by the…
The paper demonstrates that for edge-native SLMs used in decentralized governance, simpler, intuitive reasoning (System 1) is significantly more robust and efficient than complex, iterative deliberati…
Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung +2 more
The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to signifi…
Wenqi Chen, Ziyan Zhang, Bing Wang, Lin Liu +2 more
The paper introduces Tree-like Self-Play (TSP), a novel framework that treats secure code generation as a fine-grained decision process, significantly improving LLM security by forcing the model to se…
The paper introduces the Reconstructive Authority Model (RAM), a novel framework that proves execution validity by assessing state coverage rather than just state integrity, showing that existing atte…
Tianze Yang, Yucheng Shi, Ruitong Sun, Jingyuan Huang +2 more
The paper introduces TRON, an online, rule-verifiable environment substrate that generates an unbounded stream of fresh, controllable visual reasoning training instances, significantly improving RL pe…