~ similar to 2603.16576v1· 20 results
The paper proposes a unified, architecture-agnostic framework that significantly improves the robustness of deepfake image detectors against adversarial attacks by focusing on higher-order frequency s…
The paper introduces a dual-dimension evaluation for universal adversarial attacks on Vision-Language Models (VLMs), demonstrating that high reported attack success rates significantly overestimate th…
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
The paper introduces WaveGuard, a frequency-aware, single-pass defense framework that safeguards text-to-image models by injecting structured, imperceptible perturbations into generated images, thereb…
Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin +5 more
This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and saf…
Desen Sun, Jason Hon, Howe Wang, Saarth Rajan +2 more
This paper investigates a novel security vulnerability where imperceptible branding hints can be injected into images and subsequently re-rendered onto new objects by generative AI models, proposing b…
Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more
The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…
This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…
Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more
This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…
The paper proposes AHV-D&S, a novel training-free inference-time safeguard that detects and suppresses risky content in Diffusion Transformers (DiTs) by quantifying token sensitivity across attention…
The paper demonstrates that current defenses against malicious fine-tuning of foundation models are insufficient because they only address fixed attacks, and introduces a unified adaptive attack that…
Yunrui Yu, Xuxiang Feng, Pengda Qin, Pengyang Wang +4 more
The paper introduces Dummy-Aware Weighted Attack (DAWA), a novel evaluation method that significantly reduces the reported robustness of Dummy Classes-based defenses by simultaneously targeting both t…
The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…
The paper proposes a universal robustification framework to enhance drift-adaptive malware detectors against combined concept drift and adversarial attacks, significantly reducing attack success rates…
This paper introduces 'unlearning corruption attacks,' demonstrating that the performance degradation inherent in approximate graph unlearning can be exploited by an adversary to significantly reduce…
The paper proposes REACT, an adversarial training framework that significantly enhances the robustness and few-shot performance of machine-generated text detection by having a Retrieval-Augmented Gene…
This paper demonstrates that existing open-weight LLM safeguards are vulnerable to simple, non-gradient-based attacks like abliteration and prefilling, significantly increasing the attack success rate…
RefineRAG introduces a novel word-level poisoning framework that significantly enhances knowledge poisoning attacks against RAG systems, achieving state-of-the-art effectiveness and transferability to…
The paper evaluates the adversarial robustness of two open-source Vision-Language Models (LLaVA and Qwen2.5-VL) in a simulated e-commerce environment, finding that while LLaVA is vulnerable to gradien…
The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…