~ similar to 2605.11868v1· 20 results
Red-MIRROR is a novel multi-agent LLM system that automates complex web penetration testing by integrating a memory-reflection backbone, achieving superior performance on industry benchmarks.
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
Wenkui Yang, Chao Jin, Haisu Zhu, Weilin Luo +6 more
The paper introduces Semantic-level UI Element Injection, a novel red-teaming technique that overlays misleading UI elements onto screenshots to significantly improve the attack success rate against s…
The paper introduces an AI red teaming agent that drastically reduces the time and effort required for security testing by allowing operators to define complex attack goals using natural language, com…
The Cognitive Firewall is a hybrid edge-cloud defense architecture that significantly reduces the attack success rate of Indirect Prompt Injection against browser-based AI agents by combining local vi…
The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using third-party integrations, and releases AGENTREDGUAR…
The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using SaaS integrations, and releases AGENTREDGUARD, a su…
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao +2 more
WebTrap introduces a stealthy, mid-task hijacking attack that successfully compromises browser agents during long-horizon tasks by seamlessly fusing malicious instructions with the original user goal.
The paper introduces LivePI, a structured, production-like benchmark that rigorously tests the vulnerability of AI agents to indirect prompt injection across multiple real-world input surfaces, reveal…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more
VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…
Tri Cao, Yulin Chen, Hieu Cao, Yibo Li +7 more
The paper proposes WARD, a robust and efficient defense model that secures web agents against prompt injection attacks embedded in web content, achieving high recall and low false positives even again…
The paper introduces ClawTrap, a MITM-based red-teaming framework, to evaluate the security robustness of web agents like OpenClaw against dynamic, real-world network attacks, finding that model stren…
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection attacks in LLM agents by focusing detection on causally influential context segments, thereby improving scalability and…
PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…
This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…