Papers similar to 2603.27522v1

~ similar to 2603.27522v1· 20 results

cs.CRcs.LGRecentMay 19, 2026

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more

The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…

View →

cs.CRRecentMay 10, 2026

BadDLM: Backdooring Diffusion Language Models with Diverse Targets

Shengfang Zhai, Xiaoyang Ji, Yuling Shi, Haoran Gao +5 more

The paper introduces BadDLM, a unified framework that demonstrates a new class of backdoor vulnerabilities in Diffusion Language Models (DLMs) by exploiting their forward masking process across divers…

View →

cs.CRcs.AIcs.CLRecentApr 23, 2026

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang +2 more

The paper introduces BadStyle, a novel backdoor attack framework that generates natural, stealthy poisoned samples using LLMs to compromise various LLMs with high success rates and robust activation.

View →

cs.CRcs.AIRecentMay 19, 2026

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

Tobias Braun, Jonas Henry Grebe, Hossein Shakibania, Anna Rohrbach +1 more

This paper introduces the Token by Token Backdoor Attack (ToBAC), demonstrating that unified autoregressive models (UAMs) are vulnerable to backdoor attacks where a single trigger can compromise multi…

View →

cs.CRcs.LGRecentApr 7, 2026

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao +3 more

The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.

View →

cs.CRRecentMay 8, 2026

Cross-Modal Backdoors in Multimodal Large Language Models

Runhe Wang, Li Bai, Haibo Hu, Songze Li

The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…

View →

cs.CRRecentMar 17, 2026

Poisoning the Pixels: Revisiting Backdoor Attacks on Semantic Segmentation

Guangsheng Zhang, Huan Tian, Leo Zhang, Tianqing Zhu +3 more

This paper systematically revisits and expands the threat model for backdoor attacks on semantic segmentation, proposing a unified framework (BADSEG) that demonstrates severe, previously overlooked vu…

View →

cs.CRcs.CLRecentMay 14, 2026

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma +1 more

The paper introduces MetaBackdoor, a novel class of LLM backdoor attacks that exploits positional encoding (length-based triggers) rather than requiring modifications to the textual content.

View →

cs.CRcs.AIcs.CVRecentMar 31, 2026

Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

Kavindu Herath, Joshua Zhao, Saurabh Bagchi

This paper proposes SABLE, a method for generating semantically meaningful and in-distribution backdoor triggers for federated learning, demonstrating that such attacks remain a potent and practical t…

View →

cs.CRRecentApr 4, 2026

AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models

Jackson Wang

AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Travis Lelle

The paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing the backdoor generalizes at the token feature level, and proposes robust behavioral and weight-level detectors f…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection

Travis Lelle

This paper demonstrates that LoRA adapters can be backdoored via data poisoning, showing that the resulting backdoor generalizes at the token feature level, and proposes robust behavioral and weight-l…

View →

cs.CRcs.CVRecentApr 14, 2026

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li +2 more

The paper introduces SET, a robust input-level backdoor detection framework that detects hidden malicious triggers in text-to-image diffusion models by analyzing systematic differences in how benign a…

View →

cs.CRcs.AIRecentApr 21, 2026

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

Kun Wang, Cheng Qian, Miao Yu, Lilan Peng +5 more

The paper introduces ProjLens, an interpretability framework that reveals that backdoor vulnerabilities in Multimodal Large Language Models (MLLMs) are encoded within a low-rank subspace of the projec…

View →

cs.CRRecentApr 29, 2026

Indirect Prompt Injection in the Wild: An Empirical Study of Prevalence, Techniques, and Objectives

Soheil Khodayari, Xuenan Zhang, Bhupendra Acharya, Giancarlo Pellegrino

This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…

View →

cs.CRRecentApr 9, 2026

Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

Diana Romero, Mutahar Ali, Momin Ahmad Khan, Habiba Farrukh +2 more

This paper introduces the first backdoor attacks against VLM-based scanpath prediction, demonstrating variable-output attacks that evade detection and survive deployment on edge devices.

View →

cs.CRcs.AIcs.LGRecentMay 18, 2026

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

John T. Halloran, Noopur S. Bhatt

The paper proposes Open-Book Benign Rewriting (OBBR), a novel defense mechanism that uses LLM rewriting with benign samples to neutralize data poisoning attacks against LLMs, significantly improving s…

View →

cs.CRcs.AIRecentMay 8, 2026

WebTrap: Stealthy Mid-Task Hijacking of Browser Agents During Navigation

Zhichao Liu, Wenbo Pan, Haining Yu, Ge Gao +2 more

WebTrap introduces a stealthy, mid-task hijacking attack that successfully compromises browser agents during long-horizon tasks by seamlessly fusing malicious instructions with the original user goal.

View →

cs.CRcs.AIcs.CLRecentMay 21, 2026

Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

Aaditya Pai

The paper identifies a critical vulnerability, the Camouflage Detection Gap (CDG), where standard LLM injection detectors fail dramatically when malicious payloads mimic the target domain's language a…

View →

cs.CLcs.AIcs.CRRecentMay 8, 2026

Activation Differences Reveal Backdoors: A Comparison of SAE Architectures

Sachin Kumar

The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.

View →