cs.CR

Cross-Modal Backdoors in Multimodal Large Language Models

May 8, 2026

AI Summarygemma4:e4b

The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities.

Abstract

More Like This

Developers increasingly construct multimodal large language models (MLLMs) by assembling pretrained components,introducing supply-chain attack surfaces.Existing security research primarily focuses on poisoning backbones such as encoders or large language models (LLMs),while the security risks of lightweight connectors remain unexplored.In this work,we propose a novel cross-modal backdoor attack that exploits this overlooked vulnerability.By poisoning only the connector using a single seed sample and several augmented variants from one modality,the adversary can subsequently activate the backdoor using inputs from other modalities.To achieve this,we first poison the connector to associate a compact latent region with a malicious target output.To activate the backdoor from other modalities,we further extract a malicious centroid from the poisoned latent representations and perform input-side optimization to steer inputs toward this latent anchor,without requiring repeated API queries or full-model access.Extensive evaluations on representative connector-based MLLM architectures,including PandaGPT and NExT-GPT,demonstrate both the effectiveness and cross-modal transferability of the proposed attack.The attack achieves up to 99.9% attack success rate (ASR) in same-modality settings,while most cross-modal settings exceed 95.0% ASR under bounded perturbations.Moreover,the attack remains highly stealthy,producing negligible leakage on clean inputs,and maintaining weight-cosine similarity above 0.97 relative to benign connectors.We further show that existing defense strategies fail to effectively mitigate this threat without incurring substantial utility degradation.These findings reveal a fundamental vulnerability in multimodal alignment: a single compromised connector can establish a reusable latent-space backdoor pathway across modalities,highlighting the need for safer modular MLLM design.

The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common wo…

02Low21%

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual…

The paper introduces ImageProtector, a user-side method that embeds an impercept…

03Low20%

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cr…

The paper introduces SET, a robust input-level backdoor detection framework that…

04Low20%

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

This survey provides a comprehensive taxonomy and vulnerability-centric analysis…

05Low18%

Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language…

The paper introduces Critical-CoT, a novel two-stage fine-tuning defense framewo…

06Low17%

ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Labe…

ContractShield is a robust multimodal framework that uses a novel three-level fu…

07Low17%

Backdoor Attacks on Decentralised Post-Training

This paper introduces the first backdoor attack specifically targeting pipeline…

08Low17%

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

This paper introduces a novel backdoor attack (ACB) against Reinforcement Learni…