Papers similar to 2606.02497

~ similar to 2606.02497· 20 results

cs.AIRecentMay 28, 2026

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

Kun Feng, Ziwei Shan, Yuchen Fang, Yiyang Tan +5 more

KairosAgent is a novel agentic framework that combines Large Language Models (LLMs) for semantic reasoning and Time Series Foundation Models (TSFMs) for numerical forecasting, achieving superior multi…

View →

cs.AIRecentMay 30, 2026

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more

The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…

View →

cs.AIRecentMay 31, 2026

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

Xuancheng Zhu, Yang Yue, Shuaibing Wan, Zihan Dou +3 more

The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating t…

View →

cs.CLcs.AIRecentMay 31, 2026

TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning

Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more

The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…

View →

cs.AIcs.LGRecentMay 27, 2026

Dr-CiK: A Testbed for Foresight-Driven Agents

Yihong Tang, Andrew Robert Williams, Arjun Ashok, Vincent Zhihao Zheng +5 more

The paper introduces Dr-CiK, a new benchmark designed to evaluate agents' ability to proactively discover, filter, and utilize relevant external context for time series forecasting, demonstrating that…

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù +1 more

This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.

View →

cs.LGcs.AIcs.CLRecentMay 28, 2026

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding +3 more

The paper introduces LongDS, a new benchmark for long-horizon, multi-turn data analysis, demonstrating that current AI agents struggle significantly with maintaining and updating complex analytical st…

View →

cs.AIRecentMay 28, 2026

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

Shijie Cao, Yuan Yuan, Jing Liu

RACE-Sched is an asynchronous agentic framework that successfully integrates low-latency, real-time scheduling decisions with advanced, long-horizon reasoning provided by Large Language Models.

View →

cs.AIRecentMay 27, 2026

When Does Memory Help Multi-Trajectory Inference for Tool-Use LLM Agents?

Xinzhe Li, Yaguang Tao

The paper proposes a unified framework to evaluate how different types of memory transfer benefit multi-trajectory inference for tool-use LLM agents, finding that the optimal memory method depends cri…

View →

cs.AIRecentMay 27, 2026

Deconstructing Spatial Complexity: Hierarchical Decomposition for LLM Spatial Reasoning

Yi Wang, Haojie Lu, Zhaofan Zhang, Li Chen +1 more

This paper introduces MCTS-Guided Group Relative Policy Optimization (M-GRPO) to enhance LLM spatial reasoning by improving the decomposition of complex tasks into optimal sub-tasks.

View →

cs.MAcs.AIRecentMay 29, 2026

Design and Evaluation of Multi-Agent AI Oracle Systems for Prediction Market Resolution

Tarun Kota

The paper evaluates multi-agent LLM oracle systems for prediction market resolution, finding that independent aggregation with confidence-weighted voting significantly outperforms single-model baselin…

View →

cs.AIRecentMay 27, 2026

OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents

Chenyu Zhou, Xinyun Lu, Jiangyue Zhao, Jianghao Lin +2 more

The paper introduces OR-Space, a novel full-lifecycle workspace benchmark designed to rigorously evaluate industrial optimization agents by simulating real-world, multi-stage OR workflows that go beyo…

View →

cs.AIRecentMay 31, 2026

TravelEval: A Comprehensive Benchmarking Framework for Evaluating LLM-Powered Travel Planning Agents

Weiyi Chen, Shuaixiong Wang, Ziyun Gao, Kaichun Hu +4 more

The paper introduces TravelEval, a comprehensive, six-dimensional benchmarking framework that evaluates LLM-powered travel plans using realistic spatio-temporal simulation, revealing that current LLMs…

View →

cs.AIcs.CLcs.CYRecentJun 1, 2026

SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

Lichao Wang, Zhaoxing Ren, Tianzhuo Yang, Jiaming Ji +3 more

SafeMCP is a server-side defense plugin that uses look-ahead reasoning to proactively filter and constrain tool acquisition for LLM agents, thereby mitigating catastrophic risks associated with expand…

View →

cs.AIRecentMay 29, 2026

Learning Agent-Compatible Context Management for Long-Horizon Tasks

Lu Yi, Runlin Lei, Liuyi Yao, Yuexiang Xie +5 more

The paper introduces Adaptive Context Management (AdaCoM), an external context manager that uses reinforcement learning to improve the performance of frozen LLM agents on long-horizon tasks by intelli…

View →

cs.AIcs.LGRecentMay 30, 2026

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge +7 more

MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…

View →

cs.AIRecentMay 27, 2026

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Zhenyu Cui, Xiangzhong Luo

The paper investigates how LLMs allocate their internal computational depth during multi-turn agentic planning, finding that agents progressively recruit deeper layers and shift toward corrective upda…

View →

cs.LGcs.AIRecentMay 27, 2026

Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration

Haonan Wen, Hanyang Chen, Songhe Feng

The paper proposes Under-Cali, an uncertainty-driven dual-expert calibration framework, to achieve stable and efficient online forecasting for irregularly sampled multivariate time series.

View →

cs.LGcs.AIRecentMay 29, 2026

Learning to Construct Practical Agentic Systems

Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more

The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.

View →

cs.CLRecentJun 1, 2026

Unified Context Evolution for LLM Agents

Zixuan Zhu, Yitong Hu, Yong Dai, Junfeng Fang +3 more

The paper introduces Unified Context Evolution (UCE), a gradient-free framework that externalizes and manages agent experience into a typed, evolving library, significantly improving performance on mu…

View →