~ similar to 2605.30333· 19 results
This survey provides a comprehensive analysis of Reasoning Language Model (RLM) adoption across 28 scientific disciplines, revealing significant disparities in RLM maturity across different scientific…
The paper introduces SPIRE, a multi-agent framework designed to extend LLM research capabilities to the humanities by enabling evidence-grounded interpretive reasoning over primary sources.
The paper introduces ProjectionBench, a novel benchmark that progressively discloses information to evaluate LLMs' ability to generate scientific hypotheses, demonstrating that advanced models like GP…
The paper introduces a typed claim network that models cross-document references by explicitly labeling the stance (e.g., agreement, disagreement) of a citation, significantly improving downstream tas…
DenseSteer is a training-free inference-time framework that improves the math reasoning capabilities of small language models by steering their internal representations toward a 'Dense Reasoning' patt…
Ahmad Rammal, Niket Patel, Fabian Gloeckle, Amaury Hayat +4 more
The paper introduces AutoformBot, a multi-agent system that successfully autoformalizes a large corpus of open-access graduate-level mathematics textbooks into a verified library in Lean 4, demonstrat…
The paper introduces Iteris, an agentic research system, demonstrating its capability to generate numerical evidence, constructions, and proof drafts for open problems in computational mathematics, re…
The paper introduces FormInv, a measurement protocol that reveals significant semantic inconsistencies in existing mathematical reasoning benchmarks, showing that standard accuracy metrics fail to cap…
Pengyu Chen, Yonggang Zhang, Mingming Chen, Jun Song +2 more
The paper proposes a graph-constrained approach to scale multi-hop training data by decoupling path discovery from path verbalization, significantly expanding the usable corpus size for LLMs.
Xu Li, Hanzhe Tu, Xinyi Li, Kuncheng Zhao +2 more
EvoGens is an evolution-inspired framework that treats scientific idea generation as an evolutionary search, significantly boosting the novelty and diversity of generated research ideas compared to ex…
Yongsik Seo, Wooseok Jeong, Eunyoung Kim, Hyeonseo Jang +1 more
The paper introduces CITETRACE, a large-scale dataset and evaluation framework that systematically measures structural citation failures in search-augmented LLMs, revealing a pattern called Verified M…
Shashi Kumar, Yacouba Kaloga, Petr Motlicek, Ina Kodrasi +1 more
The paper introduces Geometric Latent Reasoning (GLR), a method that models reasoning as continuous paths in the embedding space, showing that this continuous approach allows LLMs to solve problems us…
Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang +21 more
This paper introduces Agents-K1, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs.
MOOSE-Copilot is a novel web-based framework that unifies scientific hypothesis discovery by formalizing human-AI interaction, significantly improving performance over autonomous LLM baselines.
The paper introduces I-WebGenBench, a framework and benchmark that converts static scientific papers into executable, interactive web systems, allowing users to dynamically explore the paper's mechani…
The paper proposes a category-theoretic framework for agentic AI that models scientific discovery not as answer generation, but as a verifiable transition and revision of the underlying representation…
The paper introduces Expected Value Alignment (EVA), a novel reward modeling procedure that allows continuous scoring of intermediate reasoning steps in formal mathematics verification while maintaini…
The paper develops a theoretically grounded framework for evaluating multilingual LLMs in Social Sciences and Humanities, moving beyond traditional NLP benchmarks to assess interpretive validity and c…
The paper introduces ProvMind, a provenance-grounded reasoning framework that significantly improves materials synthesis process optimization by accurately predicting optimal synthesis routes under ch…