~ similar to 2605.10600v1· 20 results
Aishwarya Agrawal, Roy Hirsch, Yasumasa Onoe, Sherry Ben +1 more
The paper introduces TECCI, a novel and challenging benchmark dataset of 7550 image-edit pairs, and demonstrates that current state-of-the-art text-guided image editing models struggle significantly w…
Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more
The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…
The paper introduces a dual-dimension evaluation for universal adversarial attacks on Vision-Language Models (VLMs), demonstrating that high reported attack success rates significantly overestimate th…
The paper proposes TAGBD, a graph-aware backdoor attack that demonstrates that inconspicuous poison text alone can reliably compromise text-attributed graph learning systems.
Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen +3 more
SnapGuard proposes a lightweight, multimodal method to detect prompt injection attacks in screenshot-based web agents by analyzing visual stability and contrast-polarity textual signals, achieving hig…
Weilin Lin, Ziqi Lin, Zhenxing Zhou, Jianze Li +3 more
The paper introduces RedEdit, an agentic red-teaming framework that demonstrates that malicious images can be easily edited to bypass safety classifiers while retaining their harmful semantics.
Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong +1 more
The paper introduces Hidden Ads, a novel backdoor attack for Vision-Language Models (VLMs) that injects unauthorized advertisements by exploiting natural, recommendation-seeking user behaviors, mainta…
This paper re-evaluates prompt-injection attacks in realistic RAG settings, finding that most prior attack methods fail to reach the generator, and that current attacks are easily detectable.
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
Yue Li, Linying Xue, Kaiqing Lin, Hanyu Quan +4 more
The paper proposes AEGIS, a novel diffusion-guided method for injecting adversarial perturbations into the latent space to create generalizable and robust defenses against advanced facial deepfake man…
Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang +2 more
The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Mod…
Yiming Wang, Baiqi Wu, Qingming Li, Jiahao Chen +2 more
The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization…
The paper introduces Tree structured Injection for Payloads (TIP), a novel black-box attack framework that reliably generates stealthy injection payloads to seize control of LLM agents utilizing the M…
This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…
The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei +4 more
The paper introduces REFORGE, a black-box red-teaming framework that uses adversarial image prompts to reveal persistent vulnerabilities in current Image Generation Model Unlearning (IGMU) methods.
Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more
The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…
The paper introduces a kill-chain canary methodology to diagnose prompt injection vulnerabilities across multi-stage LLM pipelines, revealing that write-node placement and document format are critical…
The paper introduces SEED, a large-scale benchmark dataset for tracing sequential deepfake facial edits, and proposes FAITH, a frequency-aware Transformer model that effectively detects and orders the…