Papers similar to 2606.02156

~ similar to 2606.02156· 20 results

cs.CLcs.AIRecentMay 28, 2026

SURGENT: A Surgical Multi-Agent Assistance System Across the Perioperative Workflow

Dongsheng Shi, Yue Li, Xin Yi, Yongyi Cui +2 more

The paper introduces SURGENT, a multi-agent assistance system designed for the entire perioperative workflow, which outperforms standard LLMs by providing context-aware, traceable, and privacy-preserv…

View →

cs.LGcs.AIcs.CLRecentMay 28, 2026

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

Matt Turk

The paper introduces the Causal Sensitivity Score (CSS), an interventional metric that reveals that standard coverage-based evaluations fail to detect critical responsiveness deficits in clinical LLMs…

View →

cs.AIRecentJun 1, 2026

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more

The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…

View →

cs.CRcs.AIcs.CLRecentMay 1, 2026

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

Alfredo Madrid-García, Miguel Rujas

This paper demonstrates that patient-facing RAG chatbots frequently expose sensitive system configurations, knowledge base details, and conversation history through client-server communication, posing…

View →

cs.AIcs.CLcs.CYRecentMay 27, 2026

MIRA: A Bilingual Benchmark for Medical Information Response Audit

Mengyu Xu, Qiaoxin Yang, Qianqian Wang, Xiwei Dai +2 more

The paper introduces MIRA, a bilingual benchmark that reveals that LLMs tend to dilute or omit critical medical information when responding to prompts from users with low health literacy, a pattern te…

View →

cs.CLcs.AIcs.IRRecentMay 27, 2026

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Yubo Li, Rema Padman, Ramayya Krishnan

This paper introduces a framework to audit source-dependence in multi-source RAG systems, demonstrating that disagreement across institutional sources is a common and critical failure mode that curren…

View →

cs.LGcs.CVRecentJun 1, 2026

Entropy Minimization without Model Collapse: Mitigating Prediction Bias in Medical Imaging

Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang +1 more

This paper identifies prediction bias, a failure mode of entropy minimization in test-time adaptation, and proposes Distribution Shift Bias Reduction (DSBR) to stabilize adaptation and prevent model c…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Cross-modal linkage risk in clinical vision-language models

Soroosh Tayebi Arasteh, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn

The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…

View →

cs.AIRecentMay 27, 2026

SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

Chao Ding, Mouxiao Bian, Tianbin Li, Minjia Yuan +11 more

The paper introduces SafeMed-R1, a clinically audited LLM that significantly improves safety and ethical alignment for medical applications, matching or exceeding resident performance on safety-critic…

View →

cs.AIcs.CLcs.ETRecentJun 1, 2026

ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo +3 more

The paper introduces ClinEnv, a novel interactive, multi-stage benchmark designed to evaluate LLMs' decision-making and information-gathering process during longitudinal inpatient medical simulations.

View →

cs.AIcs.CRq-fin.RMRecentJun 2, 2026

From Control Boundary to Insurance Claim: Reconstructing AI-Mediated Losses Through the CER Framework

Alex Leung, Rex Zhang, Kentaroh Toyoda, SiewMei Loh

This paper introduces the CER framework to address the complex problem of reconstructing AI-mediated losses for insurance claims, moving beyond simple event reconstruction to analyze the system's oper…

View →

cs.CVcs.AIcs.HCRecentMay 30, 2026

CodeCytos: AI-assisted spatial molecular imaging analysis via code-augmented agent action space

Hung Q. Vo, Huy Q. Vo, Son T. Ly, Zhihao Wan +5 more

CodeCytos is a novel coding-based reasoning agent framework that enables dynamic, programmable interaction with spatial molecular imaging data, significantly improving the automation and customization…

View →

cs.AIRecentMay 28, 2026

EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs

Yuzhang Xie, Keqi Han, Yunpeng Xiao, Hejie Cui +6 more

The paper introduces EHRBench, a large-scale, automated, and reliable benchmark derived from real Electronic Health Records (EHRs) to rigorously evaluate the clinical decision-making capabilities of L…

View →

cs.LGcs.AIstat.MLRecentMay 30, 2026

A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models

Kara Liu, Maggie Wang, Russ B. Altman

The paper proposes a novel, practical upper bound to estimate the worst-case performance of medical prediction models on the target population, even when the selection bias mechanism and target data a…

View →

cs.IRcs.CLRecentMay 29, 2026

Evaluating Factual Density in Multi-Source RAG: A Study in Medical AI Accuracy

Michael R. DeMarco

The paper introduces Factual Density (FD*), a novel retrieval signal that measures the proportion of verified facts, demonstrating that optimizing RAG retrieval based on this density significantly imp…

View →

cs.CRRecentMar 30, 2026

Policy-Driven Vulnerability Risk Quantification framework for Large-Scale Cloud Infrastructure Data Security

Wanru Shao

The paper proposes MVRAF, a data-driven framework that quantifies vulnerability risk in large-scale cloud infrastructure by integrating multiple attack attributes and analyzing cumulative risk distrib…

View →

cs.CRRecentApr 10, 2026

Hagenberg Risk Management Process (Part 3): Operationalization, Probabilities, and Causal Analysis

Eckehard Hermann, Harald Lampesberger

The paper introduces a comprehensive framework, Realtime Risk Studio, that operationalizes qualitative risk models (Bowtie diagrams) into formal, probabilistic, and intervention-ready runtime models u…

View →

cs.CRcs.AIcs.CLRecentApr 3, 2026

An Independent Safety Evaluation of Kimi K2.5

Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more

The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…

View →

cs.CRRecentMay 28, 2026

SAMD: A Tool for Identifying False Data Injection Scenarios in AI/ML-enabled Medical Devices

Mohammadreza Hallajiyan, Xueren Ge, Athish Pranav Dharmalingam, Gargi Mitra +3 more

The paper introduces SAMD, an automated tool that uses STPA-Sec to identify potential false data injection attack scenarios in AI/ML-enabled medical devices during the design phase.

View →

cs.CRcs.AIcs.NIRecentMar 26, 2026

Sovereign AI at the Front Door of Care: A Physically Unidirectional Architecture for Secure Clinical Intelligence

Vasu Srinivasan, Dhriti Vasu

The paper proposes a Sovereign AI architecture for clinical triage that ensures maximum security by performing all inference on-device and receiving data only through physically unidirectional channel…

View →