The paper introduces AudioHijack, a framework that successfully demonstrates context-agnostic and imperceptible auditory prompt injection attacks, showing that commercial Large Audio-Language Models can be hijacked with high success rates.
Modern Large audio-language models (LALMs) power intelligent voice interactions by tightly integrating audio and text. This integration, however, expands the attack surface beyond text and introduces vulnerabilities in the continuous, high-dimensional audio channel. While prior work studied audio jailbreaks, the security risks of malicious audio injection and downstream behavior manipulation remain underexamined. In this work, we reveal a previously overlooked threat, auditory prompt injection, under realistic constraints of audio data-only access and strong perceptual stealth. To systematically analyze this threat, we propose \textit{AudioHijack}, a general framework that generates context-agnostic and imperceptible adversarial audio to hijack LALMs. \textit{AudioHijack} employs sampling-based gradient estimation for end-to-end optimization across diverse models, bypassing non-differentiable audio tokenization. Through attention supervision and multi-context training, it steers model attention toward adversarial audio and generalizes to unseen user contexts. We also design a convolutional blending method that modulates perturbations into natural reverberation, making them highly imperceptible to users. Extensive experiments on 13 state-of-the-art LALMs show consistent hijacking across 6 misbehavior categories, achieving average success rates of 79\%-96\% on unseen user contexts with high acoustic fidelity. Real-world studies demonstrate that commercial voice agents from Mistral AI and Microsoft Azure can be induced to execute unauthorized actions on behalf of users. These findings expose critical vulnerabilities in LALMs and highlight the urgent need for dedicated defense.
TAAC: A gate into Trustable Audio Affective Computing
The paper proposes TAAC, a novel framework that enables accurate depression dete…
Perceptual Gaps: ASCII Art and Overlapping Audio as CAPTCHA
The paper proposes two novel CAPTCHA types—ASCII art and overlapping audio—and d…
Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey
This survey provides a comprehensive taxonomy and vulnerability-centric analysis…
STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling
STEP introduces a novel, black-box, retraining-free detector that profiles audio…
PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verific…
PlanGuard is a training-free defense framework that uses an isolated Planner and…
AgentWatcher: A Rule-based Prompt Injection Monitor
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection…
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Inject…
ClawGuard is a novel runtime security framework that deterministically enforces…
Are AI-assisted Development Tools Immune to Prompt Injection?
The paper empirically analyzes the susceptibility of seven widely used AI-assist…