The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Models (LVLMs).
Large vision-language models (LVLMs) have emerged as a powerful paradigm for multimodal intelligence, but their growing deployment also expands the attack surface of prompt injection. Despite this growing concern, existing attacks still suffer from a critical limitation: the injected prompt for one modality only steers the model's interpretation of that singular input. Alternatively, these attacks remain multimodal but fail to achieve cross-modal prompt perturbation. To bridge this gap, we introduce a novel cross-modal prompt injection attack CrossMPI, which can steer the model's interpretation of both textual and visual inputs via image-only prompt injection. Our design is underpinned by the following key breakthroughs. First, we turn the focus of the injected prompt perturbation optimization from the visual embedding space (typically with only $10^5$ parameters) to the model hidden state space (for multimodal information integration and with $10^7$ parameters). Then, two strategies are adopted to mitigate the optimization challenges posed by the larger parameter space. To constrain the optimized model parameter space, we introduce a layer selection strategy that identifies the layers most critical to multimodal integration. Interestingly, deviating from the past experience, our analysis reveals that the optimal layers for LVLM prompt perturbation reside in the middle of the model rather than the last. To constrain the image perturbation space, we propose a new distance-decremental perturbation budget assignment strategy that allocates budgets decrementally as the pixel distance to semantic-critical regions increases. Extensive experiments across multiple LVLMs and datasets show that our method significantly outperforms baseline approaches.
Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual…
The paper introduces ImageProtector, a user-side method that embeds an impercept…
WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard m…
DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection
The paper introduces PromptFuzz-SC, a novel semantic-character dual-space mutati…
PIArena: A Platform for Prompt Injection Evaluation
The paper introduces PIArena, a unified and extensible platform designed to addr…
AgentWatcher: A Rule-based Prompt Injection Monitor
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection…
Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injecti…
The paper proposes a vision for system-level defenses against indirect prompt in…
Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Syst…
The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runt…
Are AI-assisted Development Tools Immune to Prompt Injection?
The paper empirically analyzes the susceptibility of seven widely used AI-assist…