~ similar to 2605.30947· 20 results
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more
The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…
Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang +21 more
This paper introduces Agents-K1, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs.
Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang +1 more
The paper introduces AIBuildAI-2, a knowledge-enhanced agent that significantly improves the automatic building of AI models by integrating an external, evolving knowledge system, achieving state-of-t…
The study compares agentic data retrieval using unstructured web data versus structured, semantically-annotated datasets, concluding that semantic metadata remains essential for high-precision, reliab…
ResearchLoop introduces an evidence-gated control plane to manage and audit the state of AI-assisted computational research, mitigating the risk of unverified claims.
The paper outlines the potential for using generative AI to conduct large-scale, simulation-based experiments in literary studies, demonstrating initial results in generating constrained literary text…
MOOSE-Copilot is a novel web-based framework that unifies scientific hypothesis discovery by formalizing human-AI interaction, significantly improving performance over autonomous LLM baselines.
Chenghao Zhang, Guanting Dong, Yufan Liu, Tong Zhao +1 more
The paper introduces extsc{Ptah}, a multi-agent harness designed to improve verifiable multimodal deep research by orchestrating the entire report generation process, ensuring factual grounding and v…
Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao +4 more
This paper presents EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery.
This survey provides a comprehensive analysis of Reasoning Language Model (RLM) adoption across 28 scientific disciplines, revealing significant disparities in RLM maturity across different scientific…
The paper introduces a Deep Research pipeline that significantly improves literature search recall and demonstrates that human-curated citation lists are often unreliable and do not serve as a true gr…
The paper develops a theoretically grounded framework for evaluating multilingual LLMs in Social Sciences and Humanities, moving beyond traditional NLP benchmarks to assess interpretive validity and c…
Xiaoqi He, Kaixin Lan, Mu You, Tao Fang +2 more
The paper proposes MACAT, a Multi-Agent Culture-Aware Translation framework, to selectively translate culture-loaded words in ancient Chinese texts, achieving superior performance over existing method…
Xinkai Ma, Zhiqi Bai, Dingling Zhang, Pei Liu +20 more
The paper introduces TVIR, a new benchmark and multi-agent framework for deep research, to evaluate and improve the generation of factually reliable, text-visual interleaved reports.
Ziyue Yang, Da Ma, Hanqi Li, Zijian Wang +7 more
DeepSurvey is an agentic system that significantly enhances automated survey generation by extracting deep, structured knowledge from full-text papers and rigorously validating citations, achieving su…
The paper introduces COMPOSE, a dual-graph framework that generates plausible future mathematical theorems by simultaneously conditioning a language model on both the scientific citation context and t…
The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.
The paper introduces ProjectionBench, a novel benchmark that progressively discloses information to evaluate LLMs' ability to generate scientific hypotheses, demonstrating that advanced models like GP…
Weitong Qian, Beicheng Xu, Zhongao Xie, Bowen Fan +15 more
AutoSci is a memory-centric agentic system designed to automate the entire scientific research lifecycle by integrating structured memory, multi-stage execution, and continuous self-improvement.
Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao +3 more
The paper introduces Grounded Agentic Interaction Synthesis (GAIS), a framework that generates high-quality, diverse, and complex agentic training data by anchoring tasks to real-world protocols, sign…