~ similar to 2605.02374v1· 20 results
The paper evaluates four RAG architectures under knowledge base poisoning, demonstrating that advanced architectures significantly improve robustness against adversarial contradictions, localizing the…
Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao +3 more
The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.
Yuanfan Li, Qi Zhou, Chengzhengxu Li, Zhaohan Zhang +4 more
The paper introduces MGTEVAL, a comprehensive and extensible platform designed to systematically evaluate the performance, robustness, and efficiency of machine-generated text detectors.
The paper proposes an unsupervised method using multiple statistical indicators to detect adversarial or compromised context documents in Retrieval Augmented Generation (RAG) systems, even without kno…
Tri Cao, Yulin Chen, Hieu Cao, Yibo Li +7 more
The paper proposes WARD, a robust and efficient defense model that secures web agents against prompt injection attacks embedded in web content, achieving high recall and low false positives even again…
The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…
The paper proposes TAGBD, a graph-aware backdoor attack that demonstrates that inconspicuous poison text alone can reliably compromise text-attributed graph learning systems.
The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
RefineRAG introduces a novel word-level poisoning framework that significantly enhances knowledge poisoning attacks against RAG systems, achieving state-of-the-art effectiveness and transferability to…
Luze Sun, Anshuman Suri, Harsh Chaudhari, Cristina Nita-Rotaru +1 more
The paper introduces PoisonForge, a comprehensive benchmark demonstrating that even a small number of targeted poisoned examples can significantly compromise the safety and reliability of instruction-…
Zhe Yu, Wenpeng Xing, Gaolei Li, Shuguang Xiong +3 more
The paper introduces CORDON-MAS, a compartmentalized framework that defends Retrieval-Augmented Generation (RAG) against knowledge poisoning by enforcing strict information-flow control, significantly…
The paper introduces Auto-ART, a comprehensive open-source framework that provides structured meta-analysis and automated testing for adversarial robustness, revealing significant gaps in current ML s…
The paper introduces THREAT, a novel reasoning-driven framework that efficiently discovers highly effective and targeted jailbreak prompts for LLMs, revealing previously unknown safety vulnerabilities…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
This paper proposes the first web-focused threat model for agentic browsers, demonstrating that traditional web social engineering attacks can be amplified into dangerous, reproducible threats when ex…
LocalAlign proposes a generalizable prompt injection defense by generating near-target adversarial examples, which enforces a tighter robustness boundary around the correct model response.
Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen +3 more
SnapGuard proposes a lightweight, multimodal method to detect prompt injection attacks in screenshot-based web agents by analyzing visual stability and contrast-polarity textual signals, achieving hig…
Chengcai Gao, Zhihong Sun, Xiaochuan Shi, Qiufeng Wang +1 more
The paper proposes BiRD, a bidirectional ranking defense mechanism that enhances the robustness of Retrieval-Augmented Generation (RAG) against adversarial attacks by analyzing the alignment between f…
Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei +4 more
The paper introduces REFORGE, a black-box red-teaming framework that uses adversarial image prompts to reveal persistent vulnerabilities in current Image Generation Model Unlearning (IGMU) methods.