~ similar to 2606.02156· 20 results
Dongsheng Shi, Yue Li, Xin Yi, Yongyi Cui +2 more
The paper introduces SURGENT, a multi-agent assistance system designed for the entire perioperative workflow, which outperforms standard LLMs by providing context-aware, traceable, and privacy-preserv…
The paper introduces the Causal Sensitivity Score (CSS), an interventional metric that reveals that standard coverage-based evaluations fail to detect critical responsiveness deficits in clinical LLMs…
Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…
This paper demonstrates that patient-facing RAG chatbots frequently expose sensitive system configurations, knowledge base details, and conversation history through client-server communication, posing…
Mengyu Xu, Qiaoxin Yang, Qianqian Wang, Xiwei Dai +2 more
The paper introduces MIRA, a bilingual benchmark that reveals that LLMs tend to dilute or omit critical medical information when responding to prompts from users with low health literacy, a pattern te…
This paper introduces a framework to audit source-dependence in multi-source RAG systems, demonstrating that disagreement across institutional sources is a common and critical failure mode that curren…
Tim Nielen, Sameer Ambekar, Johannes Kiechle, Daniel M. Lang +1 more
This paper identifies prediction bias, a failure mode of entropy minimization in test-time adaptation, and proposes Distribution Shift Bias Reduction (DSBR) to stabilize adaptation and prevent model c…
The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…
Chao Ding, Mouxiao Bian, Tianbin Li, Minjia Yuan +11 more
The paper introduces SafeMed-R1, a clinically audited LLM that significantly improves safety and ethical alignment for medical applications, matching or exceeding resident performance on safety-critic…
Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo +3 more
The paper introduces ClinEnv, a novel interactive, multi-stage benchmark designed to evaluate LLMs' decision-making and information-gathering process during longitudinal inpatient medical simulations.
This paper introduces the CER framework to address the complex problem of reconstructing AI-mediated losses for insurance claims, moving beyond simple event reconstruction to analyze the system's oper…
Hung Q. Vo, Huy Q. Vo, Son T. Ly, Zhihao Wan +5 more
CodeCytos is a novel coding-based reasoning agent framework that enables dynamic, programmable interaction with spatial molecular imaging data, significantly improving the automation and customization…
Yuzhang Xie, Keqi Han, Yunpeng Xiao, Hejie Cui +6 more
The paper introduces EHRBench, a large-scale, automated, and reliable benchmark derived from real Electronic Health Records (EHRs) to rigorously evaluate the clinical decision-making capabilities of L…
The paper proposes a novel, practical upper bound to estimate the worst-case performance of medical prediction models on the target population, even when the selection bias mechanism and target data a…
The paper introduces Factual Density (FD*), a novel retrieval signal that measures the proportion of verified facts, demonstrating that optimizing RAG retrieval based on this density significantly imp…
The paper proposes MVRAF, a data-driven framework that quantifies vulnerability risk in large-scale cloud infrastructure by integrating multiple attack attributes and analyzing cumulative risk distrib…
The paper introduces a comprehensive framework, Realtime Risk Studio, that operationalizes qualitative risk models (Bowtie diagrams) into formal, probabilistic, and intervention-ready runtime models u…
Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more
The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…
The paper introduces SAMD, an automated tool that uses STPA-Sec to identify potential false data injection attack scenarios in AI/ML-enabled medical devices during the design phase.
The paper proposes a Sovereign AI architecture for clinical triage that ensures maximum security by performing all inference on-device and receiving data only through physically unidirectional channel…