~ similar to 2605.03129v1· 20 results
The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero at…
Soham Roy, Sarthakbrata Halder, Arya Bharaty, Vaibhav Bhaskar +4 more
The paper demonstrates that autonomous web agents are highly susceptible to social-engineering attacks, leaking critical PII even when they internally flag a site as suspicious, necessitating output-l…
Soham Roy, Sarthakbrata Halder, Arya Bharaty, Vaibhav Bhaskar +4 more
The paper demonstrates that autonomous web agents are highly susceptible to social-engineering attacks, leaking critical PII even when they internally flag a site as suspicious, necessitating output-l…
Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang +2 more
The paper introduces PIDP-Attack, a novel compound adversarial attack that combines prompt injection with database poisoning to manipulate Retrieval-Augmented Generation (RAG) systems against arbitrar…
Chia-Pei, Chen, Kentaroh Toyoda, Anita Lai +1 more
The paper introduces IPI-proxy, an open-source intercepting proxy toolkit designed to red-team web-browsing AI agents by injecting adversarial payloads into live HTTP responses from whitelisted domain…
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
This paper proposes the first web-focused threat model for agentic browsers, demonstrating that traditional web social engineering attacks can be amplified into dangerous, reproducible threats when ex…
The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…
Tri Cao, Yulin Chen, Hieu Cao, Yibo Li +7 more
The paper proposes WARD, a robust and efficient defense model that secures web agents against prompt injection attacks embedded in web content, achieving high recall and low false positives even again…
Oubo Ma, Ruixiao Lin, Jiahao Chen, Yuan Su +2 more
The paper proposes IntraGuard, a black-box, venue-agnostic defense framework that embeds hidden instructions into manuscripts via PDF structure to disrupt AI-generated peer reviews, achieving up to 84…
The paper introduces presidio-hardened-x402, an open-source middleware that intercepts x402 payment requests to detect and redact PII and enforce spending policies before on-chain settlement.
The paper introduces WebPII, a novel, large-scale synthetic benchmark for detecting personally identifiable information (PII) in web screenshots, and demonstrates a model (WebRedact) that significantl…
The paper proposes CAMP, a cross-turn privacy framework that mitigates Cumulative PII Exposure (CPE) in multi-turn LLM conversations by tracking and masking accumulated personal data across the entire…
Yulin Chen, Tri Cao, Haoran Li, Yue Liu +6 more
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard model that effectively detects prompt injection attacks in vulnerable web agents without compromising their functionality…
Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen +3 more
SnapGuard proposes a lightweight, multimodal method to detect prompt injection attacks in screenshot-based web agents by analyzing visual stability and contrast-polarity textual signals, achieving hig…
The paper introduces CAT, a novel coverage-guided fuzzing tool that overcomes the limitations of existing fuzzers for complex, multi-object cryptographic repositories like RPKI, leading to the discove…
Steven Seiden, Triss Ren, Caroline Zhang, Taein Kim +2 more
The paper proposes a novel, scalable technique using unique canary tokens to automatically and accurately identify which web scrapers are feeding data to specific Large Language Models (LLMs).
Yanming Mu, Hao Hu, Feiyang Li, Qiao Yuan +6 more
This paper provides the first comprehensive, end-to-end survey dedicated to the security of Retrieval-Augmented Generation (RAG) systems, systematically mapping threats, defenses, and benchmarks acros…
SilentRetrieval introduces a sophisticated, two-stage data poisoning attack that successfully hijacks Retrieval-Augmented Generation (RAG) systems by injecting adversarially crafted, yet highly fluent…