~ similar to 2603.19328v1· 20 results
Haoyu Wang, Zibo Xiao, Yedi Zhang, Christopher M. Poskitt +1 more
The paper proposes SafeClaw-R, a novel framework that enforces safety as a system-level invariant over the execution graph to mitigate the high safety and security risks inherent in autonomous multi-a…
The paper introduces SafeAudit, a meta-audit framework that systematically enumerates test cases and uses a quantitative metric to uncover significant residual unsafe behaviors in LLM agents that exis…
Qi Hu, Yifeng Tang, Qinghua Wang, Lanyang Zhao +6 more
The paper introduces SABER, a new benchmark that evaluates the operational safety of LLM coding agents in complex, stateful project environments, finding that current models have a high rate of harmfu…
Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more
The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…
AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…
The paper introduces BOA, a novel framework that measures agent safety by exhaustively searching the entire in-budget trajectory space, thereby identifying unsafe behaviors missed by traditional sampl…
Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu +7 more
The paper introduces SafeHarness, a novel, lifecycle-integrated security architecture that significantly reduces unsafe behavior and attack success rates in LLM agents by weaving multiple defense laye…
Lichao Wang, Zhaoxing Ren, Tianzhuo Yang, Jiaming Ji +3 more
SafeMCP is a server-side defense plugin that uses look-ahead reasoning to proactively filter and constrain tool acquisition for LLM agents, thereby mitigating catastrophic risks associated with expand…
The paper introduces TraceSafe-Bench, a comprehensive benchmark, and finds that securing LLM agents requires jointly optimizing for structural reasoning and safety alignment to mitigate risks during m…
Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more
This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.
Su Wang, Pin Qian, Yihang Chen, Junxian You +5 more
The paper introduces SkillReact, a framework that measures compositional risk in agent skill ecosystems, finding that even if individual skills are safe, their combination can create significant, unad…
Su Wang, Pin Qian, Yihang Chen, Junxian You +5 more
The paper introduces SkillReact, a framework that measures compositional risk in agent skill ecosystems, finding that even if individual skills are safe, their combination can create significant, expl…
Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui +4 more
The paper introduces RUBAS, a rubric-based reinforcement learning framework that improves agent safety by providing fine-grained, multi-dimensional rewards for complex tool-use scenarios.
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…
The paper introduces Session Risk Memory (SRM), a lightweight module that enhances per-action authorization gates with trajectory-level risk assessment, significantly improving detection of distribute…
The paper introduces the Open Agent Passport (OAP), a deterministic pre-action authorization framework that intercepts and validates AI agent tool calls against a declarative policy, achieving a 0% su…
Yaoyu Zhao, Yichen Xu, Oliver Bračevac, Cao Nguyen Pham +2 more
The paper introduces LACUNA, a novel programming model that allows LLM agents to write code that shapes the runtime environment while maintaining strong type-checking safety guarantees.
The paper introduces Owner-Harm, a formal threat model addressing the critical blind spot of AI agents harming their own deployers, demonstrating that specialized defenses are needed beyond generic sa…
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
The paper introduces SafeRedirect, a system-level defense that prevents frontier LLMs from generating harmful content during legitimate tasks that structurally require it, significantly reducing unsaf…