~ similar to 2603.27918v1· 19 results
The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…
Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao +6 more
DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulner…
The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…
The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…
Wenzhuo Xu, Zhipeng Wei, Zonghao Ying, Deyue Zhang +3 more
The paper proposes DMN, a compositional jailbreak framework that utilizes distributed instructions, multimodal evidence, and a number chain task across multiple images to significantly enhance the att…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
The paper evaluates the adversarial robustness of two open-source Vision-Language Models (LLaVA and Qwen2.5-VL) in a simulated e-commerce environment, finding that while LLaVA is vulnerable to gradien…
The paper introduces MLLM-Microscope, a system that analyzes the internal structure of multimodal large language models (MLLMs), finding that modality fusion significantly impacts the linearity and di…
Yani Wang, Yilong Yang, Yang Liu, Zhuzhu Wang +2 more
The paper introduces Distributed Semantic Recomposition (DSR), a novel cross-modal jailbreaking framework that bypasses existing safety filters by decomposing harmful intent into benign input componen…
This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…
This paper provides a unified taxonomy and controlled empirical evaluation of jailbreak attacks and defenses for Large Audio Language Models (LALMs), demonstrating that safety evaluation must consider…
This paper demonstrates that LLM cascade systems, designed for efficiency, are vulnerable to targeted adversarial attacks that simultaneously degrade both performance and cost-efficiency.
Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more
The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…
The paper demonstrates that adversarial examples can be used to manipulate Vision-Language Models (VLMs) into confidently providing authoritative but incorrect information, a process termed 'AI author…
The paper introduces THREAT, a novel reasoning-driven framework that efficiently discovers highly effective and targeted jailbreak prompts for LLMs, revealing previously unknown safety vulnerabilities…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang +2 more
The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Mod…
This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…
SALLIE introduces a lightweight, modal-agnostic runtime detection framework that effectively safeguards LLMs and VLMs against both textual and visual jailbreaks and prompt injections without performan…