~ similar to 2605.27873· 20 results
MOOSE-Copilot is a novel web-based framework that unifies scientific hypothesis discovery by formalizing human-AI interaction, significantly improving performance over autonomous LLM baselines.
Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel +2 more
The paper advocates for the establishment of Model Science, a systematic discipline that moves beyond simple benchmarking to deeply analyze AI models' internal workings and failure modes.
Zhe Zhao, Haibin Wen, Yingcheng Wu, Jiaming Ma +9 more
The paper introduces Science Earth, a planet-scale scientific runtime that enables diverse, siloed AI capabilities to connect and collaborate dynamically, demonstrating that scientific discovery can b…
Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang +21 more
This paper introduces Agents-K1, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs.
Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao +4 more
This paper presents EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery.
The paper introduces SPIRE, a multi-agent framework designed to extend LLM research capabilities to the humanities by enabling evidence-grounded interpretive reasoning over primary sources.
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more
The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…
Yujie Luo, Xiangyuan Ru, Jingsheng Zheng, Jingjing Wang +9 more
The paper introduces Autonomous Agentic Data Engineering, demonstrating that LLMs can autonomously plan and optimize end-to-end data curation pipelines, leading to substantial performance gains in spe…
MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…
AutoScientists introduces a decentralized, self-organizing team of AI agents that significantly improves long-running scientific experimentation by enabling parallel exploration and knowledge sharing.
Shangheng Du, Xiangchao Yan, Jinxin Shi, Zongsheng Cao +10 more
MLEvolve is a novel self-evolving multi-agent framework that enables LLM agents to discover and optimize machine learning algorithms for complex, long-horizon tasks.
Yiming Liu, Bin Lu, Meng Jin, Ziyuan Sang +5 more
The paper introduces Compass, an expert-guided LLM agent framework that successfully extracts and integrates thousands of previously inaccessible marine lead records from vast corpora of scientific pa…
The study compares agentic data retrieval using unstructured web data versus structured, semantically-annotated datasets, concluding that semantic metadata remains essential for high-precision, reliab…
Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more
The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…
The BEAMS initiative establishes comprehensive benchmarks and evaluates AI tools for modeling and simulation, finding that current AI tools excel at qualitative discussion tasks but struggle with comp…
The paper introduces ProjectionBench, a novel benchmark that progressively discloses information to evaluate LLMs' ability to generate scientific hypotheses, demonstrating that advanced models like GP…
The paper introduces I-WebGenBench, a framework and benchmark that converts static scientific papers into executable, interactive web systems, allowing users to dynamically explore the paper's mechani…
The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.
Weitong Qian, Beicheng Xu, Zhongao Xie, Bowen Fan +15 more
AutoSci is a memory-centric agentic system designed to automate the entire scientific research lifecycle by integrating structured memory, multi-stage execution, and continuous self-improvement.
Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more
The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.