Papers similar to 2605.19698v1

~ similar to 2605.19698v1· 20 results

cs.CRcs.CVRecentApr 14, 2026

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more

The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…

View →

cs.CLcs.CRcs.LGRecentMar 29, 2026

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong +1 more

The paper introduces Hidden Ads, a novel backdoor attack for Vision-Language Models (VLMs) that injects unauthorized advertisements by exploiting natural, recommendation-seeking user behaviors, mainta…

View →

cs.CRRecentMay 10, 2026

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

Shengfang Zhai, Xiaoyang Ji, Yuling Shi, Haoran Gao +5 more

The paper introduces BadDLM, a unified framework that demonstrates a new class of backdoor vulnerabilities in Diffusion Language Models (DLMs) by exploiting their forward masking process across divers…

View →

cs.CRcs.LGRecentApr 7, 2026

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao +3 more

The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.

View →

cs.CLcs.AIcs.CRRecentMay 8, 2026

Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

Sachin Kumar

The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.

View →

cs.CRcs.CVRecentApr 17, 2026

TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models

Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin +5 more

This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and saf…

View →

cs.CRRecentMay 8, 2026

Cross-Modal Backdoors in Multimodal Large Language Models

Runhe Wang, Li Bai, Haibo Hu, Songze Li

The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…

View →

cs.CRcs.AIRecentMay 19, 2026

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

Tobias Braun, Jonas Henry Grebe, Hossein Shakibania, Anna Rohrbach +1 more

This paper introduces the Token by Token Backdoor Attack (ToBAC), demonstrating that unified autoregressive models (UAMs) are vulnerable to backdoor attacks where a single trigger can compromise multi…

View →

cs.CRRecentMar 17, 2026

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu +3 more

This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vu…

View →

cs.CRcs.AIcs.LGEmpiricalRecentJul 28, 2026

Architectural Backdoors in Vision-Language Model Supply Chains via Representation Steering

Maria Rosaria Briglia, Igor Maljkovic, Antonio Emanuele Cinà, Luca Oneto +2 more

Researchers demonstrate how malicious providers can embed architectural backdoors into Vision--Language Model (VLM) supply chains through representation steering.

View →

cs.CVcs.CRRecentMay 11, 2026

What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

Chenyu Zhang

The paper proposes AHV-D&S, a novel training-free inference-time safeguard that detects and suppresses risky content in Diffusion Transformers (DiTs) by quantifying token sensitivity across attention…

View →

cs.CRcs.AIcs.CLRecentApr 23, 2026

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang +2 more

The paper introduces BadStyle, a novel backdoor attack framework that generates natural, stealthy poisoned samples using LLMs to compromise various LLMs with high success rates and robust activation.

View →

cs.LGcs.AIcs.CRRecentMay 8, 2026

Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors

Fan Yang, Binyan Xu, Di Tang, Kehuan Zhang

The paper proposes PRAETORIAN, a novel defense mechanism for Graph Neural Networks (GNNs) that targets the intrinsic structural requirements of backdoor attacks, significantly reducing the attack succ…

View →

cs.LGcs.CRRecentMay 27, 2026

Density-aware Sample-specific Attack

Qiyuan Wang, Yao Li, Raymond K. W. Wong

This paper proposes a density-aware attack that constructs triggers by placing poisoned samples in low-density regions of the clean data distribution, achieving high attack success rates even after st…

View →

cs.CRcs.AIRecentApr 23, 2026

CSC: Turning the Adversary's Poison against Itself

Yuchen Shi, Xin Guo, Huajie Chen, Tianqing Zhu +2 more

The paper proposes Cluster Segregation Concealment (CSC), a novel defense that identifies and neutralizes backdoor triggers by relabeling poisoned samples to a virtual class, achieving near-zero attac…

View →

cs.CRcs.AIEmpiricalRecentJul 22, 2026

Defense Against LLM Backdoors using Critical Neuron Isolation Pruning

Yuxi Li, Zhibo Zhang, Kailong Wang, Xingshuo Han +2 more

The paper introduces DeCNIP, a method for identifying and neutralizing backdoors in large language models using representational analysis and neuron isolation pruning.

View →

cs.CRcs.AIRecentApr 10, 2026

BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun

The paper introduces BadSkill, a novel backdoor attack formulation that targets third-party agent skills by poisoning the embedded model artifacts, achieving high attack success rates across various m…

View →

cs.CRcs.AIcs.CVRecentApr 10, 2026

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal

CLIP-Inspector (CI) is a novel model-level backdoor detection method that reconstructs potential triggers using out-of-distribution (OOD) images to verify the security of prompt-tuned CLIP models.

View →

cs.CRRecentApr 10, 2026

Unreal Thinking: Chain-of-Thought Hijacking via Two-stage Backdoor

Wenhan Chang, Tianqing Zhu, Ping Xiong, Faqian Guan +1 more

The paper proposes Two-stage Backdoor Hijacking (TSBH) to create persistent, trigger-activated malicious behaviors by manipulating the observable Chain-of-Thought (CoT) process in Large Language Model…

View →

cs.CRRecentApr 15, 2026

BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models

Jagadeesh Rachapudi, Ritali Vatsi, Pranav Singh, Praful Hambarde +1 more

BackFlush introduces a novel, knowledge-free framework that detects and eliminates unknown backdoor attacks in LLMs while simultaneously preserving existing watermarks, achieving high detection rates…

View →