~ similar to 2605.19698v1· 20 results
Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more
The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…
Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong +1 more
The paper introduces Hidden Ads, a novel backdoor attack for Vision-Language Models (VLMs) that injects unauthorized advertisements by exploiting natural, recommendation-seeking user behaviors, mainta…
Shengfang Zhai, Xiaoyang Ji, Yuling Shi, Haoran Gao +5 more
The paper introduces BadDLM, a unified framework that demonstrates a new class of backdoor vulnerabilities in Diffusion Language Models (DLMs) by exploiting their forward masking process across divers…
Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao +3 more
The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.
The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.
Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin +5 more
This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and saf…
The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…
This paper introduces the Token by Token Backdoor Attack (ToBAC), demonstrating that unified autoregressive models (UAMs) are vulnerable to backdoor attacks where a single trigger can compromise multi…
Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu +3 more
This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vu…
The paper proposes AHV-D&S, a novel training-free inference-time safeguard that detects and suppresses risky content in Diffusion Transformers (DiTs) by quantifying token sensitivity across attention…
Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang +2 more
The paper introduces BadStyle, a novel backdoor attack framework that generates natural, stealthy poisoned samples using LLMs to compromise various LLMs with high success rates and robust activation.
The paper proposes PRAETORIAN, a novel defense mechanism for Graph Neural Networks (GNNs) that targets the intrinsic structural requirements of backdoor attacks, significantly reducing the attack succ…
This paper proposes a density-aware attack that constructs triggers by placing poisoned samples in low-density regions of the clean data distribution, achieving high attack success rates even after st…
Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more
The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…
The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various m…
CLIP-Inspector (CI) is a novel model-level backdoor detection method that reconstructs potential triggers using out-of-distribution (OOD) images to verify the security of prompt-tuned CLIP models.
Wenhan Chang, Tianqing Zhu, Ping Xiong, Faqian Guan +1 more
The paper proposes Two-stage Backdoor Hijacking (TSBH) to create persistent, trigger-activated malicious behaviors by manipulating the observable Chain-of-Thought (CoT) process in Large Language Model…
BackFlush introduces a novel, knowledge-free framework that detects and eliminates unknown backdoor attacks in LLMs while simultaneously preserving existing watermarks, achieving high detection rates…
Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei +4 more
The paper introduces REFORGE, a black-box red-teaming framework that uses adversarial image prompts to reveal persistent vulnerabilities in current Image Generation Model Unlearning (IGMU) methods.
The paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing the backdoor generalizes at the token feature level, and proposes robust behavioral and weight-level detectors f…