Papers similar to 2605.14291v1

~ similar to 2605.14291v1· 20 results

cs.CVcs.AIcs.CRRecentApr 10, 2026

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…

View →

cs.CRcs.AIRecentMar 30, 2026

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur

This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…

View →

cs.CRcs.AIRecentMar 17, 2026

Adversarial attacks against Modern Vision-Language Models

Alejandro Paredes La Torre

The paper evaluates the adversarial robustness of two open-source Vision-Language Models (LLaVA and Qwen2.5-VL) in a simulated e-commerce environment, finding that while LLaVA is vulnerable to gradien…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Jailbreaking Multimodal Large Language Models using Multi-Clip Video

Choongwon Kang, Seungjong Sun, Hyunmin Jun, Jang Hyun Kim

The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…

View →

cs.CRcs.CVRecentMay 15, 2026

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang +2 more

The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Mod…

View →

cs.CRcs.AIRecentMay 21, 2026

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

Yilan Gao, Sida Huang, Hongyuan Zhang, Xuelong Li

The paper introduces WaveGuard, a frequency-aware, single-pass defense framework that safeguards text-to-image models by injecting structured, imperceptible perturbations into generated images, thereb…

View →

cs.CLcs.CRRecentMay 8, 2026

GLiGuard: Schema-Conditioned Classification for LLM Safeguard

Urchade Zaratiana, Mary Newhauser, George Hurn-Maloney, Ash Lewis

GLiGuard introduces a compact, schema-conditioned bidirectional encoder that achieves state-of-the-art performance in LLM content moderation across multiple safety dimensions while drastically reducin…

View →

cs.CVcs.AIcs.CRRecentMar 25, 2026

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more

The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…

View →

cs.CRcs.AIcs.CYRecentMar 24, 2026

Robust Safety Monitoring of Language Models via Activation Watermarking

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…

View →

cs.CRcs.LGRecentApr 17, 2026

SafeLM: Unified Privacy-Aware Optimization for Trustworthy Federated Large Language Models

Noor Islam S. Mohammad, Uluğ Bayazıt

SafeLM is a comprehensive framework that jointly addresses privacy, security, misinformation, and adversarial robustness in federated LLMs, achieving high safety performance while significantly reduci…

View →

cs.AIcs.CRRecentMay 18, 2026

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more

This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…

View →

cs.CRcs.AIRecentApr 6, 2026

SALLIE: Safeguarding Against Latent Language & Image Exploits

Guy Azov, Ofer Rivlin, Guy Shtar

SALLIE introduces a lightweight, modal-agnostic runtime detection framework that effectively safeguards LLMs and VLMs against both textual and visual jailbreaks and prompt injections without performan…

View →

cs.CRcs.LGRecentMay 5, 2026

Laundering AI Authority with Adversarial Examples

Jie Zhang, Pura Peetathawatchai, Florian Tramèr, Avital Shafran

The paper demonstrates that adversarial examples can be used to manipulate Vision-Language Models (VLMs) into confidently providing authoritative but incorrect information, a process termed 'AI author…

View →

cs.CLcs.AIRecentMay 30, 2026

MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models

Ravil Mussabayev, Rustam Mussabayev

The paper introduces MLLM-Microscope, a system that analyzes the internal structure of multimodal large language models (MLLMs), finding that modality fusion significantly impacts the linearity and di…

View →

cs.CVcs.AIcs.CRRecentMar 17, 2026

REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models

Yong Zou, Haoran Li, Fanxiao Li, Shenyang Wei +4 more

The paper introduces REFORGE, a black-box red-teaming framework that uses adversarial image prompts to reveal persistent vulnerabilities in current Image Generation Model Unlearning (IGMU) methods.

View →

cs.CRcs.AIcs.LGEmpiricalRecentJul 28, 2026

Architectural Backdoors in Vision-Language Model Supply Chains via Representation Steering

Maria Rosaria Briglia, Igor Maljkovic, Antonio Emanuele Cinà, Luca Oneto +2 more

Researchers demonstrate how malicious providers can embed architectural backdoors into Vision--Language Model (VLM) supply chains through representation steering.

View →

cs.CLRecentMay 28, 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more

The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…

View →

cs.CLcs.CRcs.LGRecentApr 3, 2026

Learning the Signature of Memorization in Autoregressive Language Models

David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko

The paper introduces a novel, transferable learned attack (LT-MIA) that detects a universal 'signature of memorization' in language models, achieving high accuracy across diverse model architectures (…

View →

cs.CRRecentApr 19, 2026

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi

The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…

View →