Papers similar to 2604.16515v1

~ similar to 2604.16515v1· 20 results

cs.CRcs.AIRecentMar 17, 2026

Adversarial attacks against Modern Vision-Language Models

The paper evaluates the adversarial robustness of two open-source Vision-Language Models (LLaVA and Qwen2.5-VL) in a simulated e-commerce environment, finding that while LLaVA is vulnerable to gradien…

View →

cs.CRcs.CVRecentMay 15, 2026

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang +2 more

The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Mod…

View →

cs.CVcs.CLRecentMay 29, 2026

Learning from Fine-Grained Visual Discrepancies: Mitigating Multimodal Hallucinations via In-Context Visual Contrastive Optimization

Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen +2 more

The paper proposes In-Context Visual Contrastive Optimization (IC-VCO) to rigorously mitigate multimodal hallucinations in Vision-Language Models by optimizing contrastive learning within a shared mul…

View →

cs.CRcs.AIRecentMar 30, 2026

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur

This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…

View →

cs.AIcs.CRRecentMay 18, 2026

Hallucination as Exploit: Evidence-Carrying Multimodal Agents

Guijia Zhang, Hao Zheng, Harry Yang

The paper introduces Evidence-Carrying Agents (ECA) to prevent multimodal agents from executing privileged actions based on unsupported or hallucinated perceptual claims, achieving near-zero unsafe ex…

View →

cs.CRRecentMay 8, 2026

Cross-Modal Backdoors in Multimodal Large Language Models

Runhe Wang, Li Bai, Haibo Hu, Songze Li

The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…

View →

cs.CVcs.AIRecentMay 28, 2026

Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

Soumyadeep Jana, Pulkit Mittal, Sanasam Ranbir Singh

The paper proposes BRACS, a training-free steering framework that adaptively corrects visual grounding failures in large vision-language models, significantly reducing object hallucination without sac…

View →

econ.THcs.CRcs.GTRecentMay 5, 2026

The Adversarial Discount -- AI, Signal Correlation, and the Cybersecurity Arms Race

James W. Bono

The paper models the cybersecurity arms race using a contest-theoretic framework, showing that full cross-correlation of threat intelligence can neutralize the attacker's structural advantage from inc…

View →

cs.CRcs.AIcs.CLRecentMay 14, 2026

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Chengshuai Zhao, Zhen Tan, Dawei Li, Zhiyuan Yu +1 more

The paper proposes MMGuard, a proactive defense mechanism that injects unlearnable, human-imperceptible perturbations into multimodal data to prevent unauthorized fine-tuning of Large Vision-Language…

View →

cs.CVcs.AIRecentMay 28, 2026

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

Qian Chen, Xianyin Zhang, Yanzhi Liu, Lifan Guo +2 more

This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform mode…

View →

cs.CLcs.CRcs.LGRecentMar 29, 2026

Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models

Duanyi Yao, Changyue Li, Zhicong Huang, Cheng Hong +1 more

The paper introduces Hidden Ads, a novel backdoor attack for Vision-Language Models (VLMs) that injects unauthorized advertisements by exploiting natural, recommendation-seeking user behaviors, mainta…

View →

cs.AIRecentMay 27, 2026

Utility-Aware Multimodal Contrastive Learning for Product Image Generation

Xiaohang Feng, Yiling Xie

The paper proposes a utility-aware multimodal contrastive learning framework that optimizes product image generation not just for semantic coherence, but also for maximizing consumer demand in online…

View →

cs.CRcs.LGRecentApr 7, 2026

Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

Yiyang Zhang, Chaojian Yu, Ziming Hong, Yuanjie Shao +3 more

The paper proposes a novel Text-Guided Backdoor (TGB) attack that uses common words in text descriptions as stealthy triggers for multimodal models, enhancing practicality and controllability.

View →

cs.CRRecentApr 3, 2026

ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Label Vulnerability Detection in Obfuscated Smart Contracts

Minh-Dai Tran-Duong, Nguyen Hai Phong, Nguyen Chi Thanh, Doan Minh Trung +3 more

ContractShield is a robust multimodal framework that uses a novel three-level fusion mechanism to accurately detect multiple types of vulnerabilities in obfuscated smart contracts, significantly outpe…

View →

cs.DCcs.AIRecentJun 1, 2026

Boosting Multimodal Federated Learning via Chained Modality Optimization

Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang +1 more

The paper proposes FedMChain, a novel federated learning framework that structures multimodal training into sequential phases to mitigate modality competition and improve model performance while reduc…

View →

cs.CRcs.AIRecentMay 2, 2026

VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models

Pang Liu, Yingjie Lao

The paper introduces a dual-dimension evaluation for universal adversarial attacks on Vision-Language Models (VLMs), demonstrating that high reported attack success rates significantly overestimate th…

View →

cs.CLcs.AIcs.CRRecentMay 12, 2026

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

Buyun Liang, Jinqi Luo, Liangzu Peng, Kwan Ho Ryan Chan +5 more

The paper introduces REALISTA, a novel latent-space adversarial attack framework that generates semantically realistic and coherent prompts to effectively induce hallucinations in large language model…

View →

cs.CVcs.AIcs.CRRecentApr 10, 2026

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…

View →

cs.LGcs.CRRecentApr 13, 2026

Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)

Chenhao Fang, Jordi Mola, Mark Harman, Jason Nawrocki +9 more

The paper introduces a Hybrid Utility Minimum Bayes Risk (HUMBR) framework to significantly reduce hallucinations in high-stakes enterprise AI workflows, outperforming standard consistency methods.

View →

cs.CRcs.AIcs.CVRecentMay 15, 2026

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao +6 more

DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulner…

View →