~ similar to 2604.07831v1· 20 results
Chia-Pei, Chen, Kentaroh Toyoda, Anita Lai +1 more
The paper introduces IPI-proxy, an open-source intercepting proxy toolkit designed to red-team web-browsing AI agents by injecting adversarial payloads into live HTTP responses from whitelisted domain…
Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more
The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by embedding malicious text into user-generated content regions of mobile screenshots, successfully…
Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong +6 more
The paper introduces MIRAGE, a novel pipeline that generates context-aware prompt injection attacks by injecting malicious text into user-generated content regions of mobile screenshots, successfully…
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
Yulin Chen, Tri Cao, Haoran Li, Yue Liu +6 more
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard model that effectively detects prompt injection attacks in vulnerable web agents without compromising their functionality…
This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…
The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…
The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…
Tri Cao, Yulin Chen, Hieu Cao, Yibo Li +7 more
The paper proposes WARD, a robust and efficient defense model that secures web agents against prompt injection attacks embedded in web content, achieving high recall and low false positives even again…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao +2 more
WebTrap introduces a stealthy, mid-task hijacking attack that successfully compromises browser agents during long-horizon tasks by seamlessly fusing malicious instructions with the original user goal.
Weilin Lin, Ziqi Lin, Zhenxing Zhou, Jianze Li +3 more
The paper introduces RedEdit, an agentic red-teaming framework that demonstrates that malicious images can be easily edited to bypass safety classifiers while retaining their harmful semantics.
Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen +3 more
SnapGuard proposes a lightweight, multimodal method to detect prompt injection attacks in screenshot-based web agents by analyzing visual stability and contrast-polarity textual signals, achieving hig…
The paper identifies a critical vulnerability, the Camouflage Detection Gap (CDG), where standard LLM injection detectors fail dramatically when malicious payloads mimic the target domain's language a…
Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li +5 more
The paper introduces OS-BLIND, a benchmark demonstrating that current safety evaluations fail to detect critical vulnerabilities in computer-use agents when user instructions are benign, showing high…
The paper argues that prompt injection is a fundamental vulnerability in AI agents, proposing that Contextual Integrity (CI) offers a principled framework to understand and mitigate context-sensitive…
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
Yuanbo Xie, Tianyun Liu, Yingjie Zhang, Suchen Liu +3 more
The paper introduces and analyzes cross-session stored prompt injection, demonstrating that persistent system state transforms prompt injection from a temporary model-level threat into a long-lived, s…
The paper introduces an AI red teaming agent that drastically reduces the time and effort required for security testing by allowing operators to define complex attack goals using natural language, com…