~ similar to 2605.28914v1· 20 results
Suliu Qin, Haomin Zhuang, Yujun Zhou, Yufei Han +1 more
AIRGuard is a runtime authority control guard that operationalizes least privilege to prevent agent attacks by enforcing step-level authorization over external side effects.
AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…
Jiaqi Luo, Songyang Peng, Jiarun Dai, Zhile Chen +5 more
AgentGuard is an attribute-based access control framework designed to mitigate severe security risks, such as privacy leakage and system compromise, in tool-using LLM-based agents.
AgentWall is a runtime safety layer that intercepts and evaluates all proposed actions from local AI agents against a declarative policy, ensuring safety before execution.
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
Xiaochong Jiang, Shiqi Yang, Ziwei Li, Lifei Liu +2 more
ChainCaps introduces a novel runtime capability budgeting system that prevents 'permission laundering' in complex tool-using agents, significantly reducing attack success rates while maintaining benig…
ClawGuard is a novel runtime security framework that deterministically enforces user-confirmed rules at tool-call boundaries to protect LLM agents from indirect prompt injection.
Yixiang Zhang, Xinhao Deng, Jiaqing Wu, Yue Xiao +2 more
The paper introduces AgentWard, a lifecycle-oriented, defense-in-depth architecture designed to systematically secure autonomous AI agents by protecting them across all stages of their operation.
This paper systematically analyzes security risks in cloud-hosted, tool-enabled AI agents, concluding that most risks stem from over-privileged tools and capability-intent mismatches rather than novel…
The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using third-party integrations, and releases AGENTREDGUAR…
The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using SaaS integrations, and releases AGENTREDGUARD, a su…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…
The paper introduces a comprehensive security framework, AgentRFC, to systematically analyze and test the security conformance of various AI agent protocols, identifying critical design gaps, especial…
The paper introduces Policy-First Tooling, a model-agnostic permission layer that significantly enhances the safety and reliability of tool-orchestrated AI workflows by enforcing explicit constraints…
Baoyuan Wu, Qingshan Liu, Adel Bibi, Irwin King +1 more
The paper argues that the Authorization-Execution Gap (AEG)—the divergence between intended authorization and actual execution—is a critical safety and security flaw in open-world agents, requiring so…
The paper proposes Proof-Carrying Agent Actions (PCAA), a runtime-neutral governance model that uses action certificates to consistently track and authorize high-risk actions across diverse and hetero…
The paper proposes an architectural proxy (MCP) to enforce robust, reliable tool access control for LLM agents, demonstrating that this structural enforcement is necessary because prompt-based restric…
PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…
Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more
The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…