Papers similar to 2606.01904

~ similar to 2606.01904· 19 results

cs.CLRecentJun 1, 2026

Towards Multidisciplinary Summarization of Hospital Stays: Efficient Sentence-Level Clinical Provenance Categorization

Baris Karacan, Vaibhav Bhargava, Barbara Di Eugenio, Natalie Parde +20 more

The paper introduces a supervised fine-tuning pipeline using large language models to accurately categorize sentence-level clinical provenance across multi-disciplinary hospital notes, demonstrating t…

View →

cs.CLcs.AIRecentMay 28, 2026

Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs

Mahdi Alkaeed, Adnan Qayyum, Nabeel Abo Kashreef, Muhammad Bilal +1 more

The paper evaluates the semantic stability of clinical LLMs to linguistic variations, finding that domain specialization does not guarantee consistent robustness improvements.

View →

cs.CLRecentMay 29, 2026

Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptation and Verification-Guided Deferral

Danish Ali, Li Xiaojian, Sundas Iqbal, Farrukh Zaidi

The paper introduces a reliability-oriented framework, IndicBERT-HPA, for multilingual orthopedic decision support from clinical narratives, achieving high performance and significantly improving reli…

View →

cs.LGcs.CRRecentApr 29, 2026

Fidelity, Diversity, and Privacy: A Multi-Dimensional LLM Evaluation for Clinical Data Augmentation

Guillermo Iglesias, Gema Bello-Orgaz, María Navas-Loro, Cristian Ramirez-Atencia +2 more

This paper evaluates multiple LLMs (DeepSeek-R1, OpenBioLLM-Llama3, Qwen 3.5) for generating privacy-safe, high-quality synthetic mental health reports, demonstrating their effectiveness in expanding…

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

David Rey-Blanco, Roberto Cruz

The authors demonstrate that fine-tuning a two-stage retrieval system using synthetic data generated by large language models can significantly improve the performance of medical semantic search for c…

View →

cs.CLcs.AIRecentMay 28, 2026

MedCase-Structured: A Text-to-FHIR Dataset for Benchmarking Diagnostic Reasoning in Clinically Realistic EHR Settings

Valentina Bui Muti, Eugénie Dulout, Ziquan Fu

The paper introduces MedCase-Structured, a synthetic, FHIR-formatted dataset designed to benchmark diagnostic reasoning in realistic EHR settings, showing that LLMs perform worse on structured data th…

View →

cs.AIRecentMay 28, 2026

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Kai-Chen Cheng, Haejun Han, David Q. Sun

The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…

View →

cs.CRcs.AIcs.CLRecentApr 23, 2026

Differentially Private De-identification of Dutch Clinical Notes: A Comparative Evaluation

Michele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy +3 more

This paper conducts the first comparative study of Differential Privacy (DP), Named Entity Recognition (NER), and Large Language Models (LLMs) for de-identifying Dutch clinical notes, finding that com…

View →

cs.CLcs.AIRecentMay 28, 2026

Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate

David Fraile Navarro, Berardino Como, Jialei Sheng, Soundariya Ananthan +1 more

The paper investigates apparent LLM triage failures and concludes that the errors originate in the output format and decision process, rather than a deficiency in the model's underlying clinical knowl…

View →

cs.AIcs.CLcs.ETRecentJun 1, 2026

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo +3 more

The paper introduces ClinEnv, a novel interactive, multi-stage benchmark designed to evaluate LLMs' decision-making and information-gathering process during longitudinal inpatient medical simulations.

View →

cs.CLcs.AIRecentMay 27, 2026

Reverse Probing: Supervised Token-level Uncertainty Quantification for Large Language Models in Clinical Text

Bushi Xiao, Sarvesh Soni, Daisy Zhe Wang

The paper introduces Reverse Probing, a novel framework that quantifies token-level uncertainty in large language models (LLMs) specifically for clinical text by analyzing internal model activations,…

View →

cs.AIcs.CLcs.CYRecentMay 27, 2026

MIRA: A Bilingual Benchmark for Medical Information Response Audit

Mengyu Xu, Qiaoxin Yang, Qianqian Wang, Xiwei Dai +2 more

The paper introduces MIRA, a bilingual benchmark that reveals that LLMs tend to dilute or omit critical medical information when responding to prompts from users with low health literacy, a pattern te…

View →

cs.CLRecentJun 1, 2026

Why Do Self-Harm Prediction Models Struggle to Generalise? Lexical and Semantic Variations in Emergency Department Triage Notes

Liuliu Chen, Mike Conway, Jo Robinson, Vlada Rozova

This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…

View →

cs.CVcs.AIRecentMay 29, 2026

Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation

Zhiyuan Yang, Jiahao Cheng, Vincent Quoc-Huy Trinh, Mahdi S. Hosseini

The paper introduces a simple, token-efficient vision-language model for generating comprehensive pathology synoptic reports from multiple whole-slide images (WSIs), achieving high performance while s…

View →

cs.CLRecentJun 1, 2026

PortBERT: Navigating the Depths of Portuguese Language Models

Raphael Scheible-Schmitt, Henry He, Armando B. Mendes

The paper introduces PortBERT, a family of RoBERTa-based language models for Portuguese, which achieves competitive performance while explicitly balancing efficiency and accuracy.

View →

cs.CVcs.AIcs.CLRecentMay 29, 2026

Generating Reports or Repeating Templates? Measuring and Mitigating Template Collapse in 3D CT Report Generation

Tom Maye-Lasserre, Yitong Li, Bailiang Jian, Morteza Ghahremani +2 more

The paper addresses 'Template Collapse' in 3D CT report generation—where models generate generic reports—by proposing CLarGen, a decoupled framework that significantly improves clinical accuracy and d…

View →

cs.IRcs.CLDatasetRecentJun 9, 2026

A PubMed-Scale Dataset of Structured Biomedical Abstracts

Chia-Hsuan Chang, Haerin Song, Brian Ondov, Hua Xu

The authors introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database.

View →

cs.AIRecentMay 27, 2026

Trends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI Exploration

Sandra Woolley, Tim Collins, Khalid Khattak, Illia Chernomorets +2 more

This study analyzes ClinicalTrials.gov records to track the rising trend of AI in clinical trials and demonstrates that a hybrid human-AI screening approach is viable but requires clearer reporting of…

View →

cs.AIRecentMay 27, 2026

SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

Chao Ding, Mouxiao Bian, Tianbin Li, Minjia Yuan +11 more

The paper introduces SafeMed-R1, a clinically audited LLM that significantly improves safety and ethical alignment for medical applications, matching or exceeding resident performance on safety-critic…

View →