~ similar to 2605.10180v1· 19 results
The paper proposes SafeDIG, a robust safety steering framework that adapts Diffusion Transformers for text-to-image generation by treating safety control as position-aware sparse feature transfer, ens…
Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more
The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…
Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin +5 more
This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and saf…
Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more
The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…
The paper proposes Neighbor-Aware Localized Concept Erasure (NLCE), a training-free framework that effectively removes specific concepts from text-to-image models while minimizing the unintended degra…
Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more
The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
Jun Li, Lizhi Xiong, Ziqiang Li, Weiwei Jiang +3 more
The paper introduces TICoE, a text-image collaborative framework that achieves precise and faithful concept removal from text-to-image generative models, surpassing existing methods in both precision…
Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei +4 more
The paper introduces REFORGE, a black-box red-teaming framework that uses adversarial image prompts to reveal persistent vulnerabilities in current Image Generation Model Unlearning (IGMU) methods.
The paper demonstrates that content suppression techniques used in language models only mask prohibited content at the output level, failing to eliminate the underlying concepts from the model's inter…
CoreUnlearn introduces a novel framework that disentangles and removes undesirable concepts from text-guided diffusion models by targeting specific, erasure-critical components of the concept embeddin…
This paper introduces the Token by Token Backdoor Attack (ToBAC), demonstrating that unified autoregressive models (UAMs) are vulnerable to backdoor attacks where a single trigger can compromise multi…
This paper demonstrates that Concept Bottleneck Models (CBMs), despite their interpretability, are highly vulnerable to targeted adversarial attacks that manipulate semantic concepts, and proposes SPE…
Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu +3 more
This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vu…
Desen Sun, Jason Hon, Howe Wang, Saarth Rajan +2 more
This paper investigates a novel security vulnerability where imperceptible branding hints can be injected into images and subsequently re-rendered onto new objects by generative AI models, proposing b…
Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more
The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…
The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…
Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more
The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…
The paper introduces 'contrastive privacy,' a formal, model-agnostic, and quantitative method for evaluating the semantic success of AI-based sanitization across multiple media modalities.