~ similar to 2605.29368· 20 results
Ruihui Hou, Ziyue Huai, Chennuo Zhang, Ziyan Liu +4 more
CAREAgent is a novel agent designed for fine-grained clinical order generation, achieving significant performance improvements on unseen benchmarks by integrating structured reasoning and tool usage.
The paper proposes ARSM-Agent, a full-link security enhancement framework, to significantly improve the adversarial robustness and security of large language model agents used for critical medical dec…
Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more
The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…
Xinyu Wang, Hanwei Wu, Zhenghan Tai, Sicheng Lyu +6 more
The paper introduces SafeRx-Agent, a knowledge-grounded multi-agent framework that improves medication recommendation accuracy and safety by incorporating fine-grained ATC codes and rigorous safety ve…
Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…
The paper proposes a Sovereign AI architecture for clinical triage that ensures maximum security by performing all inference on-device and receiving data only through physically unidirectional channel…
This protocol paper outlines a comprehensive, AI-driven framework using CT imaging to predict the risk of colorectal anastomotic leak, aiming to shift surgical planning toward data-driven precision.
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
Chao Ding, Mouxiao Bian, Tianbin Li, Minjia Yuan +11 more
The paper introduces SafeMed-R1, a clinically audited LLM that significantly improves safety and ethical alignment for medical applications, matching or exceeding resident performance on safety-critic…
Yanan Wang, Shuaicong Hu, Jian Liu, Guohui Zhou +2 more
The paper proposes HetMedAgent, a multi-agent framework, demonstrating that combining generalist LLMs with domain-specific specialist models significantly improves medical AI performance by enabling s…
Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo +3 more
The paper introduces ClinEnv, a novel interactive, multi-stage benchmark designed to evaluate LLMs' decision-making and information-gathering process during longitudinal inpatient medical simulations.
The paper proposes and validates a comprehensive four-layer Zero Trust security architecture designed to mitigate critical vulnerabilities in autonomous AI agents handling Protected Health Information…
AgentSOC introduces a multi-layered agentic AI framework designed to automate Security Operations Centers (SOCs) by integrating perception, anticipatory reasoning, and risk-based action planning to im…
This study analyzes ClinicalTrials.gov records to track the rising trend of AI in clinical trials and demonstrates that a hybrid human-AI screening approach is viable but requires clearer reporting of…
Kaustav Kundu, Ritvik Shrivastava, Maxim Arap, Nanshu Wang +12 more
This paper introduces a proactive multi-modal assistant system and a large-scale dataset for procedural assistance.
Yuzhang Xie, Keqi Han, Yunpeng Xiao, Hejie Cui +6 more
The paper introduces EHRBench, a large-scale, automated, and reliable benchmark derived from real Electronic Health Records (EHRs) to rigorously evaluate the clinical decision-making capabilities of L…
Minfeng Qi, Tianqing Zhu, Zijie Xu, Congcong Zhu +2 more
The paper introduces CAESAR, a novel multi-agent framework that coordinates LLM agents across five specialized roles to improve success rates and stability in complex, multi-stage cyber intrusion task…
The paper demonstrates that tool-augmented agentic AI can learn from prior field experiment data to automatically generate superior, domain-specific interventions, transforming one-shot A/B testing in…
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more
The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…
The paper introduces the Causal Sensitivity Score (CSS), an interventional metric that reveals that standard coverage-based evaluations fail to detect critical responsiveness deficits in clinical LLMs…