Papers similar to 2606.01856

~ similar to 2606.01856· 18 results

cs.AIRecentMay 27, 2026

FedMPT: Federated Multi-label Prompt Tuning of Vision-Language Models

Xucong Wang, Pengkun Wang, Zhe Zhao, Liheng Yu +2 more

FedMPT introduces a novel federated learning framework for Multi-Label Recognition (MLR) using Vision-Language Models (VLMs) by leveraging generalizable conditions to mitigate label overfitting and im…

View →

cs.LGcs.CRRecentMay 20, 2026

Choose Wisely and Privately: Proactive Client Selection for Fair and Efficient Federated Learning

Adda Akram Bendoukha, Heber Hwang Arcolezi, Nesrine Kaaniche, Aymen Boudguiga

The paper proposes a proactive client selection framework that optimizes the selection of client subsets to ensure high data utility and fairness before federated learning begins, leading to faster an…

View →

cs.CRcs.AIRecentMar 30, 2026

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur

This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…

View →

cs.LGcs.AIcs.CRRecentMay 8, 2026

UMEDA: Unified Multi-modal Efficient Data Fusion for Privacy-Preserving Graph Federated Learning via Spectral-Gated Attention and Diffusion-Based Operator Alignment

Shih-Yu Lai, Hirozumi Yamaguchi, Shang-Tse Chen, Yu-Lun Liu +1 more

UMEDA introduces a novel graph federated learning framework that uses spectral signal processing and diffusion models to enable privacy-preserving, robust localization across clients with highly heter…

View →

cs.CRRecentMay 8, 2026

Cross-Modal Backdoors in Multimodal Large Language Models

Runhe Wang, Li Bai, Haibo Hu, Songze Li

The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…

View →

cs.CRcs.AIcs.LGRecentMay 20, 2026

Frequency-Domain Regularized Adversarial Alignment for Transferable Attacks against Closed-Source MLLMs

Leitao Yuan, Qinghua Mao, Daizong Liu, Kun Wang +4 more

The paper proposes FRA-Attack, a frequency-domain regularization method, to significantly improve the transferability of adversarial attacks against closed-source Multimodal Large Language Models (MLL…

View →

cs.IRcs.AIcs.LGRecentMay 28, 2026

Multimodal Music Recommendation System using LLMs

Srikar Prabhas Kandagatla, Sreehitha R. Narayana, Chandana Magapu, Swetha Mohan +5 more

The paper proposes a novel multimodal framework for session-based music recommendation that jointly models audio, lyric, and semantic content signals within a unified LLM-based sequential reasoning sy…

View →

cs.CRcs.AIcs.CLRecentMay 14, 2026

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Chengshuai Zhao, Zhen Tan, Dawei Li, Zhiyuan Yu +1 more

The paper proposes MMGuard, a proactive defense mechanism that injects unlearnable, human-imperceptible perturbations into multimodal data to prevent unauthorized fine-tuning of Large Vision-Language…

View →

cs.LGcs.AIcs.DCRecentMay 29, 2026

Federated Variational Preference Alignment with Gumbel-Softmax Prior for Personalized User Preferences

Jabin Koo, Hoyoung Kim, Minwoo Jang, Jungseul Ok

The paper proposes FedVPA-GP, a federated learning framework that uses a Gumbel-Softmax prior and orthogonal loss to personalize LLM alignment by disentangling conflicting user preferences while maint…

View →

cs.LGcs.CVRecentJun 1, 2026

Closing the Alignment-Maturity Gap in Federated Prototype Learning

Mario Casado-Diez, Alejandro Dopico-Castro, Verónica Bolón-Canedo, Bertha Guijarro-Berdiñas

The paper proposes FedSAP, a framework that stabilizes federated prototype learning by delaying global alignment and enforcing inter-class structure, significantly improving representation quality und…

View →

cs.CLcs.AIRecentMay 30, 2026

MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models

Ravil Mussabayev, Rustam Mussabayev

The paper introduces MLLM-Microscope, a system that analyzes the internal structure of multimodal large language models (MLLMs), finding that modality fusion significantly impacts the linearity and di…

View →

cs.CRcs.CVRecentMay 15, 2026

A Cross-Modal Prompt Injection Attack against Large Vision-Language Models with Image-Only Perturbation

Hao Yang, Zhuo Ma, Yang Liu, Yilong Yang +2 more

The paper introduces CrossMPI, a novel cross-modal prompt injection attack that uses image-only perturbations to steer the interpretation of both textual and visual inputs in Large Vision-Language Mod…

View →

cs.CVcs.AIRecentMay 28, 2026

Benchmarking Large Vision-Language Models on CFMME: A Comprehensive Chinese Financial Multimodal Evaluation Dataset

Qian Chen, Xianyin Zhang, Yanzhi Liu, Lifan Guo +2 more

This paper introduces CFMME, a comprehensive Chinese financial multimodal benchmark, and evaluates current Large Vision-Language Models (LVLMs), finding that while state-of-the-art models perform mode…

View →

cs.AIRecentMay 31, 2026

Towards Understanding Modality Interaction in Multimodal Language Models via Partial Information Decomposition

Wanlong Fang, Tianle Zhang, Wen Tao, Alvin Chan

The paper introduces Partial Information Decomposition (PID) to quantitatively separate unique, redundant, and synergistic contributions of different modalities (e.g., vision, language) in multimodal…

View →

cs.AIRecentMay 27, 2026

A Conflict-Aware Penalty and Statistical Loss Framework for Balancing Modalities and Enhancing Stability in Multimodal Sentiment Analysis

Jianheng Dai, Jiazhang Liang, Sijie Mai

The paper introduces a Conflict-aware Penalty (CP) and Statistical Loss (SL) framework to stabilize and balance the training of multimodal sentiment analysis models, achieving state-of-the-art perform…

View →

cs.CRcs.LGRecentMay 7, 2026

FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning

Su Zhang, Junfeng Guo, Heng Huang

FedAttr introduces a novel client-level attribution protocol for Federated Learning (FL) that accurately identifies which clients trained on watermarked data while maintaining strong privacy guarantee…

View →

cs.LGcs.AImath.OCRecentMay 28, 2026

A Unified Framework for Gradient Aggregation in Multi-Objective Optimization

Zeou Hu, Kelvin Ho, Yaoliang Yu

The paper introduces a unified theoretical framework for gradient aggregation in multi-objective optimization, establishing convergence rates and sufficient conditions for achieving Pareto stationarit…

View →

cs.AIcs.CRRecentMay 18, 2026

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more

This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…

View →