~ similar to 2604.04978v2· 20 results
Yubin Qu, Ying Zhang, Yanjun Zhang, Gelei Deng +3 more
The paper introduces OverEager-Gen, a new benchmark that measures 'overeager actions'—where coding agents perform unauthorized tasks beyond a benign request—and finds that removing explicit consent de…
The paper introduces VibeGuard, a pre-publish security gate framework designed to detect novel vulnerabilities—such as source map exposure and packaging drift—that arise from developers over-relying o…
Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng +8 more
The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization,…
Baoyuan Wu, Qingshan Liu, Adel Bibi, Irwin King +1 more
The paper argues that the Authorization-Execution Gap (AEG)—the divergence between intended authorization and actual execution—is a critical safety and security flaw in open-world agents, requiring so…
The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…
The paper introduces Agent Control Protocol (ACP), a stateful temporal admission control mechanism that enforces behavioral properties over execution traces to prevent harmful patterns from individual…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
Jiangrong Wu, Yuhong Nan, Yixi Lin, Huaijin Wang +3 more
SkillScope introduces a graph-based framework to enforce fine-grained least-privilege in LLM Agent Skills, significantly reducing over-privileged actions while maintaining task functionality.
Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more
The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…
The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…
Suliu Qin, Haomin Zhuang, Yujun Zhou, Yufei Han +1 more
AIRGuard is a runtime authority control guard that operationalizes least privilege to prevent language agents from executing unauthorized side effects, significantly reducing attack success rates on a…
Suliu Qin, Haomin Zhuang, Yujun Zhou, Yufei Han +1 more
AIRGuard is a runtime authority control guard that operationalizes least privilege to prevent agent attacks by enforcing step-level authorization over external side effects.
The paper introduces RADAR, a risk-aware automated code review system, demonstrating that it can significantly reduce review bottlenecks and improve efficiency for AI-generated code without compromisi…
Quan Zhang, Lianhang Fu, Lvsi Lian, Gwihwan Go +4 more
The paper introduces GrantBox, a new security sandbox that evaluates how well LLM agents handle real-world tool privileges, finding that agents remain highly vulnerable to sophisticated attacks.
Neil Fendley, Zhengyu Liu, Aonan Guan, Jiacheng Zhong +1 more
The paper introduces JAW, a novel framework that demonstrates how adversaries can hijack agentic workflows on automation platforms like GitHub Actions by manipulating inputs based on context-grounded…
The paper proposes extbackslash codeName, a behavioral firewall that uses a parameterized deterministic finite automaton (pDFA) to enforce verified benign tool-call sequences and parameter bounds for…
Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more
The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…
Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz +1 more
The study evaluated four frontier AI models to assess their reliability in following safety research goals, finding no confirmed instances of sabotage but noting that certain models frequently refuse…
The paper introduces the Open Agent Passport (OAP), a deterministic pre-action authorization framework that intercepts and validates AI agent tool calls against a declarative policy, achieving a 0% su…