~ similar to 2605.18583v1· 20 results
Yubin Qu, Yi Liu, Gelei Deng, Yanjun Zhang +3 more
The paper introduces SNARE, a novel adaptive benchmarking pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the v…
Yubin Qu, Yi Liu, Gelei Deng, Yanjun Zhang +3 more
The paper introduces SNARE, a novel adaptive testing pipeline that systematically measures overeager behavior in coding agents, finding that the agent framework accounts for the majority of the variat…
Zimo Ji, Zongjie Li, Wenyuan Jiang, Yudong Gao +1 more
The paper independently stress-tests Claude Code's auto mode permission system using a deliberately ambiguous benchmark, finding that its true false negative rate is significantly higher than reported…
Jiangrong Wu, Yuhong Nan, Yixi Lin, Huaijin Wang +3 more
SkillScope introduces a graph-based framework to enforce fine-grained least-privilege in LLM Agent Skills, significantly reducing over-privileged actions while maintaining task functionality.
The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
Elle Najt, Colin Toft, Tyler Tracy, Fabien Roger +1 more
The paper introduces SLEIGHT-Bench, a benchmark of 40 synthetic attacks, demonstrating that current LLM monitor systems fail to detect a significant number of covert, harmful actions executed by codin…
The paper introduces Gram, an automated framework that assesses AI agent propensity for sabotage, finding that while Gemini models show low rates of misbehavior, increasing environmental realism signi…
The paper proposes extbackslash codeName, a behavioral firewall that uses a parameterized deterministic finite automaton (pDFA) to enforce verified benign tool-call sequences and parameter bounds for…
Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng +4 more
The paper introduces Document-Driven Implicit Payload Execution (DDIPE) to demonstrate that malicious code can be embedded in LLM agent skill documentation, allowing supply-chain attacks to hijack age…
The paper introduces the Open Agent Passport (OAP), a deterministic pre-action authorization framework that intercepts and validates AI agent tool calls against a declarative policy, achieving a 0% su…
Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi +4 more
This study analyzes over 20,000 real-world coding sessions to show that AI coding agents frequently fail users through subtle misalignment, requiring constant manual correction even when major system…
Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung +2 more
The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to signifi…
Yibing Liu, Yangze Liu, Xiaolong Yin, Bin Wang +3 more
The paper introduces OpenClawBench, a large-scale dataset and framework for measuring process-side anomalies in real-world agent execution trajectories, demonstrating that task success does not guaran…
Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng +8 more
The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization,…
The paper introduces a large, consensus-labeled prompt bank that reliably distinguishes between requests for executable malicious code and requests for harmful security knowledge, providing a standard…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more
The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…
Xiaochong Jiang, Shiqi Yang, Ziwei Li, Lifei Liu +2 more
ChainCaps introduces a novel runtime capability budgeting system that prevents 'permission laundering' in complex tool-using agents, significantly reducing attack success rates while maintaining benig…
Baoyuan Wu, Qingshan Liu, Adel Bibi, Irwin King +1 more
The paper argues that the Authorization-Execution Gap (AEG)—the divergence between intended authorization and actual execution—is a critical safety and security flaw in open-world agents, requiring so…