~ similar to 2606.02458· 20 results
This study analyzes ClinicalTrials.gov records to track the rising trend of AI in clinical trials and demonstrates that a hybrid human-AI screening approach is viable but requires clearer reporting of…
Garvin Guo, Donglei Yu, Yu Chen, Xiang Wang +5 more
The paper argues that observed gains in multimodal agents using tools may be due to learning tool-calling patterns rather than genuine capability expansion, finding that tool access provides little co…
The BEAMS initiative establishes comprehensive benchmarks and evaluates AI tools for modeling and simulation, finding that current AI tools excel at qualitative discussion tasks but struggle with comp…
Yulei Ye, Wenhao Li, Zhong Wen, Yunshu Huang +22 more
The paper introduces AgentSchool, an advanced LLM-powered multi-agent simulator that models learning as state transitions to provide a robust, ethically viable testbed for educational research and ped…
The paper introduces a Generative AI-augmented User Experience Research (UXR) methodology, operationalized through a four-stage process, to create actionable, stigma-aware design guidance for digital…
This study surveyed higher education practitioners to map their beliefs and behaviors regarding AI integration, finding that while they view AI favorably, institutional barriers and gaps in design-ori…
Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang +1 more
The paper introduces AIBuildAI-2, a knowledge-enhanced agent that significantly improves the automatic building of AI models by integrating an external, evolving knowledge system, achieving state-of-t…
Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more
The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.
The paper introduces an agentic, framework-based system to transform under-specified academic papers into standardized, comparable, and executable benchmarks for industrial Prognostics and Health Mana…
This study investigated the stability and prompt-responsiveness of AI tools in classifying the cognitive demand of math tasks, finding that few-shot prompting was a more reliable performance booster t…
Tong Liu, Cheng Qian, Matej Cief, Yuan He +3 more
This paper analyzes tool-calling in LLM agents, demonstrating that evaluation results are highly sensitive to implementation details and proposing new techniques to significantly improve the efficienc…
Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more
The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.
The study found that while contextualizing AI responses reduces their persuasive power, combining this technique with conversational warmth restores persuasiveness, suggesting that user deference to A…
Ruihui Hou, Ziyue Huai, Chennuo Zhang, Ziyan Liu +4 more
CAREAgent is a novel agent designed for fine-grained clinical order generation, achieving significant performance improvements on unseen benchmarks by integrating structured reasoning and tool usage.
Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more
The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…
Liuji Chen, Dianxing Tang, Xing Shi, Dingshuo Chen +3 more
The paper proposes EAPO, a framework that enables agentic models to learn when to forgo using external tools, thereby mitigating tool abuse while maintaining high reasoning accuracy.
Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao +1 more
COLLEAGUE.SKILL introduces an automated system that distills heterogeneous traces of human expertise and role-specific knowledge into portable, inspectable, and usable AI skill packages.
Julius Gabelmann, Felix Jahn, Kevin Baum, Sophie van Rossum +3 more
This paper proposes a modular, agentic AI chatbot architecture to assist students with exercise solving, aiming to ensure responsible and pedagogically sound AI use in education.