ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.02276· 20 results

cs.CVcs.IRcs.LGRecentJun 4, 2026

A Vision-language Framework for Comparative Reasoning in Radiology

Tengfei Zhang, Ziheng Zhao, Lisong Dai, Xiaoman Zhang +4 more

This paper introduces MedReCo and MedReCo-VLM, a framework that enables entity-aware cross-image reasoning for medical imaging, allowing AI to compare current scans with prior studies and analogous ca…

View →
cs.CVcs.AIcs.CLRecentMay 29, 2026

Generating Reports or Repeating Templates? Measuring and Mitigating Template Collapse in 3D CT Report Generation

Tom Maye-Lasserre, Yitong Li, Bailiang Jian, Morteza Ghahremani +2 more

The paper addresses 'Template Collapse' in 3D CT report generation—where models generate generic reports—by proposing CLarGen, a decoupled framework that significantly improves clinical accuracy and d…

View →
cs.CVcs.AIRecentMay 29, 2026

Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

Zhiyuan Yang, Jiahao Cheng, Vincent Quoc-Huy Trinh, Mahdi S. Hosseini

The paper introduces a simple, token-efficient vision-language model for generating comprehensive pathology synoptic reports from multiple whole-slide images (WSIs), achieving high performance while s…

View →
cs.AIcs.DBcs.IRRecentMay 29, 2026

Vector Linking via Cross-Model Local Isometric Consistency

Ziying Chen, Yang Cao, He Sun, Beining Yang +1 more

The paper proposes a novel geometric embedding hashing method to recover object correspondences (vector links) between two embedding clouds generated by different black-box encoders using only a small…

View →
cs.CVcs.AIRecentMay 29, 2026

Detect Before You Leap: Mirage Detection in Vision-Language Models

Sayeed Shafayet Chowdhury, Md. Shaown Miah

The paper introduces Text-Conditioned Layer-wise Internal Alignment (TC-LIA), a model-agnostic method that significantly improves the detection of 'mirage'—when Vision-Language Models confidently answ…

View →
cs.CVcs.AIcs.LGRecentMay 28, 2026

CardioLens: Revealing the Clinical Reality Gap of MLLMs via Multi-Sequence Cardiac MRI Evaluations

Zixian Su, Hongkai Zhang, Fan Gao, Encheng Su +11 more

The paper introduces CardioLens, a rigorous evaluation testbed for multi-sequence Cardiac MRI, which reveals that current Multimodal Large Language Models (MLLMs) exhibit a significant 'clinical reali…

View →
cs.CVcs.AIcs.CRRecentApr 10, 2026

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…

View →
cs.CVcs.AIcs.CRRecentMar 25, 2026

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more

The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…

View →
cs.AIRecentMay 30, 2026

SDR: Set-Distance Rewards for Radiology Report Generation

Halil Ibrahim Gulluk, Max Van Puyvelde, Wim Van Criekinge, Olivier Gevaert

The paper introduces Set-Distance Rewards (SDR), a permutation-invariant reward signal that effectively guides the generation of unordered radiology reports, significantly outperforming standard train…

View →
cs.LGcs.CRRecentApr 29, 2026

Fidelity, Diversity, and Privacy: A Multi-Dimensional LLM Evaluation for Clinical Data Augmentation

Guillermo Iglesias, Gema Bello-Orgaz, María Navas-Loro, Cristian Ramirez-Atencia +2 more

This paper evaluates multiple LLMs (DeepSeek-R1, OpenBioLLM-Llama3, Qwen 3.5) for generating privacy-safe, high-quality synthetic mental health reports, demonstrating their effectiveness in expanding…

View →
cs.AIcs.CRRecentMay 18, 2026

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more

This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…

View →
cs.CLRecentMay 31, 2026

PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining

Guanghao Zhu, Zeyu Liu, Zhitian Hou, Pengkai Wang +8 more

The paper introduces PMC-InterCPT, a refined biomedical interleaved corpus that enhances multimodal continued pretraining by integrating figure-referencing body text alongside captions, leading to imp…

View →
cs.CLcs.LGRecentMay 30, 2026

Towards Lightweight Reliability: Using Soft Prompts for Hallucination Mitigation in Large Language Models

S M Tahmid Siddiqui, Akib Jawad Ononto, Anoop Singhal, Latifur Khan

The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…

View →
cs.AIRecentMay 27, 2026

SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

Chao Ding, Mouxiao Bian, Tianbin Li, Minjia Yuan +11 more

The paper introduces SafeMed-R1, a clinically audited LLM that significantly improves safety and ethical alignment for medical applications, matching or exceeding resident performance on safety-critic…

View →
cs.CRcs.AIcs.CLRecentMay 14, 2026

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Chengshuai Zhao, Zhen Tan, Dawei Li, Zhiyuan Yu +1 more

The paper proposes MMGuard, a proactive defense mechanism that injects unlearnable, human-imperceptible perturbations into multimodal data to prevent unauthorized fine-tuning of Large Vision-Language…

View →
cs.CLcs.AIcs.LGRecentMay 28, 2026

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

David Rey-Blanco, Roberto Cruz

The authors demonstrate that fine-tuning a two-stage retrieval system using synthetic data generated by large language models can significantly improve the performance of medical semantic search for c…

View →
cs.CVcs.AIEmpiricalRecentJun 10, 2026

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Cheng-Yu Yang, Shao-Yuan Lo, Yu-Lun Liu

肖代替了视觉令牌的永久删除,通过可恢复的路由来改进视觉语言模型的性能

View →
cs.CLRecentMay 31, 2026

Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG

Francielle Vargas, João Robiatti, Diego Alves, Lucas Pascotti Valem +5 more

The paper introduces CERA, a novel contrastive retrieval framework that improves RAG factuality and interpretability by using subjectivity-based hard negative selection and an auxiliary attention alig…

View →
cs.CLRecentJun 1, 2026

Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification

Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more

The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…

View →
cs.AIRecentMay 27, 2026

FedMPT: Federated Multi-label Prompt Tuning of Vision-Language Models

Xucong Wang, Pengkun Wang, Zhe Zhao, Liheng Yu +2 more

FedMPT introduces a novel federated learning framework for Multi-Label Recognition (MLR) using Vision-Language Models (VLMs) by leveraging generalizable conditions to mitigate label overfitting and im…

View →