~ similar to 2605.30984· 20 results
The paper introduces a simple, token-efficient vision-language model for generating comprehensive pathology synoptic reports from multiple whole-slide images (WSIs), achieving high performance while s…
The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…
Tengfei Zhang, Ziheng Zhao, Lisong Dai, Xiaoman Zhang +4 more
This paper introduces MedReCo and MedReCo-VLM, a framework that enables entity-aware cross-image reasoning for medical imaging, allowing AI to compare current scans with prior studies and analogous ca…
Guanghao Zhu, Zeyu Liu, Zhitian Hou, Pengkai Wang +8 more
The paper introduces PMC-InterCPT, a refined biomedical interleaved corpus that enhances multimodal continued pretraining by integrating figure-referencing body text alongside captions, leading to imp…
The paper proposes RL-ACRGNet, an improved encoder-decoder model that uses reinforcement learning to generate high-quality, clinically coherent chest radiology reports, significantly outperforming exi…
Zixian Su, Hongkai Zhang, Fan Gao, Encheng Su +11 more
The paper introduces CardioLens, a rigorous evaluation testbed for multi-sequence Cardiac MRI, which reveals that current Multimodal Large Language Models (MLLMs) exhibit a significant 'clinical reali…
Sunisth Kumar, Xanh Ho, Tim Schopf, Andre Greiner-Petter +2 more
The paper explains the 'table-chart gap' in scientific claim verification by showing that multimodal LLMs successfully encode information from charts but fail to route it to the final prediction layer…
The paper introduces Set-Distance Rewards (SDR), a permutation-invariant reward signal that effectively guides the generation of unordered radiology reports, significantly outperforming standard train…
The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…
This paper evaluates multiple LLMs (DeepSeek-R1, OpenBioLLM-Llama3, Qwen 3.5) for generating privacy-safe, high-quality synthetic mental health reports, demonstrating their effectiveness in expanding…
Yeqi Huang, Yue Chen, Yanwei Ye, Guanhao Su +1 more
The paper introduces Ryze, an automated system that synthesizes evidence-enriched Question-Answering (QA) pairs from raw biomedical papers, resulting in a specialized VLM (BioVLM-8B) that significantl…
Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more
The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.
The paper introduces MedCase-Structured, a synthetic, FHIR-formatted dataset designed to benchmark diagnostic reasoning in realistic EHR settings, showing that LLMs perform worse on structured data th…
Yuwei Miao, Gen Li, Yunsheng Zeng, Xiandong Li +7 more
C-MIG is a novel retrieval-augmented generation framework that uses multi-view information gain to improve clinical diagnosis reasoning by providing richer, more nuanced reward signals than existing m…
Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang +1 more
This paper identifies prediction bias, a failure mode of entropy minimization in test-time adaptation, and proposes Distribution Shift Bias Reduction (DSBR) to stabilize adaptation and prevent model c…
The paper introduces Factual Density (FD*), a novel retrieval signal that measures the proportion of verified facts, demonstrating that optimizing RAG retrieval based on this density significantly imp…
The paper introduces two methods, ermodel and ermodel, to significantly reduce hallucinations in clinical summarization by using hallucination detectors to guide iterative revisions and subsequently…
The paper introduces a robust, two-part framework (HyPE and HyPS) using hyperbolic geometry to efficiently detect and sanitize malicious prompts targeting Vision-Language Models (VLMs).
The paper systematically compares multimodal transformer and LLM approaches for document type classification, finding that specialized multimodal Transformers outperform LLM-based models, especially w…
The paper investigates apparent LLM triage failures and concludes that the errors originate in the output format and decision process, rather than a deficiency in the model's underlying clinical knowl…