~ similar to 2605.17634v1· 20 results
Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more
The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…
Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang +3 more
The paper introduces a contextual security framework for LLM agents, defining security properties and reformulating various attacks and defenses based on the context of execution.
Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao +2 more
WebTrap introduces a stealthy, mid-task hijacking attack that successfully compromises browser agents during long-horizon tasks by seamlessly fusing malicious instructions with the original user goal.
This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…
The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…
The paper introduces ASPI, a benchmark showing that requiring LLM agents to seek clarification significantly amplifies their vulnerability to prompt injection attacks.
Yuanbo Xie, Tianyun Liu, Yingjie Zhang, Suchen Liu +3 more
The paper introduces and analyzes cross-session stored prompt injection, demonstrating that persistent system state transforms prompt injection from a temporary model-level threat into a long-lived, s…
Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah +1 more
This paper systematically studies memory poisoning attacks in LLM agents, identifying multiple vulnerabilities and proposing a new benchmark to assess the risk.
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
The paper empirically analyzes the susceptibility of seven widely used AI-assisted development tools (MCP clients) to prompt injection via tool-poisoning, revealing significant disparities in their se…
Priyal Deep, Shane Emmons, Amy Fox, Kyle Bacon +3 more
The paper evaluates prompt injection defenses and finds that only external output filtering, implemented in application code, reliably prevents secret leaks from LLMs, demonstrating that model-based d…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…
The paper introduces Distributed Sentinel, a zero-trust architecture that prevents Context-Fragmented Violations (CFVs) in multi-agent systems by propagating security state across departmental boundar…
The paper introduces Tree structured Injection for Payloads (TIP), a novel black-box attack framework that reliably generates stealthy injection payloads to seize control of LLM agents utilizing the M…
The paper introduces DeepTrap, an automated framework that evaluates security vulnerabilities in agentic language models by manipulating their internal execution contexts, demonstrating that task comp…
PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection attacks in LLM agents by focusing detection on causally influential context segments, thereby improving scalability and…