~ similar to 2605.30002· 20 results
The paper introduces an LLM-agent framework to solve the 'last-mile forecasting' problem, bridging the gap between raw statistical predictions and business-ready forecasts by incorporating weakly stru…
Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more
The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…
Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more
The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…
The paper introduces Dr-CiK, a new benchmark designed to evaluate agents' ability to proactively discover, filter, and utilize relevant external context for time series forecasting, demonstrating that…
The paper introduces CosmicFish-HRM, a compact language model that achieves adaptive reasoning by dynamically allocating computational effort through a Hierarchical Reasoning Module (HRM), showing tha…
Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang +21 more
This paper introduces Agents-K1, an end-to-end knowledge orchestration pipeline that converts raw documents into agent-native scientific knowledge graphs.
Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more
The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…
Yunhai Hu, Zining Liu, Xiangyang Yin, Tianhua Xia +4 more
DREAM-R is a novel framework that significantly enhances speculative reasoning in large multimodal models by optimizing draft generation alignment, introducing a robust verification mechanism, and ena…
Yang Zhang, En Chun, Ziyun Mao, Yulu Wu +1 more
GS-Fuse is a novel multimodal framework that improves financial forecasting by adaptively fusing event text and price data, achieving state-of-the-art performance by explicitly modeling the directiona…
Zhensheng Wang, Xiaole Liu, Wenmian Yang, Kun Zhou +2 more
The paper introduces Open-Domain Tabular Question Answering for Future Data Forecasting and Reasoning, a new dataset and framework that enables LLMs to perform time-series forecasting and reasoning on…
MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…
Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su +7 more
This paper proposes SpatialClaw, a training-free framework for spatial reasoning that enables open-ended, complex 3D/4D spatial reasoning.
Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding +3 more
The paper introduces LongDS, a new benchmark for long-horizon, multi-turn data analysis, demonstrating that current AI agents struggle significantly with maintaining and updating complex analytical st…
The paper investigates forecasting sparse and bursty vulnerability sightings, concluding that traditional time-series models like SARIMAX are inadequate, and count-based methods like Poisson regressio…
Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang +1 more
The paper introduces AIBuildAI-2, a knowledge-enhanced agent that significantly improves the automatic building of AI models by integrating an external, evolving knowledge system, achieving state-of-t…
MOOSE-Copilot is a novel web-based framework that unifies scientific hypothesis discovery by formalizing human-AI interaction, significantly improving performance over autonomous LLM baselines.
The study compares agentic data retrieval using unstructured web data versus structured, semantically-annotated datasets, concluding that semantic metadata remains essential for high-precision, reliab…
The study demonstrates that domain adaptation primarily reshapes the linguistic explanatory framework of language models, causing shifts in cosmological stance secondarily, rather than directly modify…
This survey provides a comprehensive analysis of Reasoning Language Model (RLM) adoption across 28 scientific disciplines, revealing significant disparities in RLM maturity across different scientific…
The paper investigates how LLMs allocate their internal computational depth during multi-turn agentic planning, finding that agents progressively recruit deeper layers and shift toward corrective upda…