~ similar to 2605.31042· 20 results
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
ClawGuard is a novel runtime security framework that deterministically enforces user-confirmed rules at tool-call boundaries to protect LLM agents from indirect prompt injection.
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
The paper introduces a systematic framework and defense mechanisms to analyze and mitigate autonomous LLM agent worms that propagate through persistent agent state and cross-platform multi-agent syste…
Fazhong Liu, Zhuoyan Chen, Tu Lan, Haozhen Tan +5 more
This paper identifies and characterizes 'guidance injection,' a stealthy attack vector that embeds adversarial operational narratives into autonomous coding agents' bootstrap guidance, demonstrating h…
The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models LLM prompts as structured segments to intercept prompt injection attacks with high accuracy and…
This paper analyzes the security of LLM-based autonomous agents by drawing parallels to operating system security, finding that while some vulnerabilities are inherent, many can be mitigated using est…
PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more
The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…
Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more
VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
This paper systematically maps the expanded attack surface of agentic AI systems, identifying new threat vectors like RAG poisoning and cross-agent manipulation, and proposes a comprehensive security…
AgentShield is a deception-based framework that detects successful indirect prompt injections in tool-using LLM agents across multiple languages by placing traps within the agent's tool interface.
The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…
This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…
The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…
Yuanbo Xie, Tianyun Liu, Yingjie Zhang, Suchen Liu +3 more
The paper introduces and analyzes cross-session stored prompt injection, demonstrating that persistent system state transforms prompt injection from a temporary model-level threat into a long-lived, s…
This survey analyzes the unique security threats posed by complex, multi-agent AI systems and proposes Confidential Computing (CC) using Trusted Execution Environments (TEEs) as a hardware-rooted defe…
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection attacks in LLM agents by focusing detection on causally influential context segments, thereby improving scalability and…