Papers similar to 2605.27904

~ similar to 2605.27904· 20 results

cs.AIRecentJun 1, 2026

Bridging the Last Mile of Time Series Forecasting with LLM Agents

Yuhua Liao, Zetian Wang, Qiangqiang Nie, Zhenhua Zhang

The paper introduces an LLM-agent framework to solve the 'last-mile forecasting' problem, bridging the gap between raw statistical predictions and business-ready forecasts by incorporating weakly stru…

View →

cs.AIRecentMay 30, 2026

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

Qiuyu Tian, Zequn Liu, Yingce Xia, Haojie Yin +1 more

The paper introduces ForeSci, a novel benchmark that evaluates LLM agents' ability to make forward-looking research judgments using only historical evidence, finding that explicit evidence organizatio…

View →

cs.LGcs.AIRecentJun 1, 2026

Why Do Time Series Models Need Long Context Windows?

Luca Butera, Giovanni De Felice, Andrea Cini, Cesare Alippi

The paper argues that long context windows are necessary for time series forecasting not just to capture long-range dependencies, but primarily to reduce uncertainty about the underlying data-generati…

View →

cs.CLcs.AIRecentMay 31, 2026

TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning

Yaxuan Kong, Qingren Yao, Yuqi Nie, Yichen Li +6 more

The paper introduces TimeSage-MT, a comprehensive multi-turn benchmark designed to rigorously test an LLM agent's ability to perform complex, evolving time series analysis, revealing critical gaps in…

View →

cs.AIRecentMay 28, 2026

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

Kun Feng, Ziwei Shan, Yuchen Fang, Yiyang Tan +5 more

KairosAgent is a novel agentic framework that combines Large Language Models (LLMs) for semantic reasoning and Time Series Foundation Models (TSFMs) for numerical forecasting, achieving superior multi…

View →

cs.CLcs.AIcs.IRRecentMay 29, 2026

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Haoxiang Zhang, Qixin Xu, Zhuofeng Li, Lei Zhang +3 more

The paper analyzes observation masking in long-horizon search agents, finding that its effectiveness depends on a complex interaction between the model's capacity and the retriever's strength, exhibit…

View →

cs.CLcs.AIcs.LGRecentMay 29, 2026

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

LongTraceRL addresses long-context reasoning challenges by generating highly challenging training data and introducing a fine-grained rubric reward, significantly improving evidence-grounded reasoning…

View →

cs.IRcs.AIcs.CLRecentJun 1, 2026

ODTQA-FoRe: An Open-Domain Tabular Question Answering Dataset for Future Data Forecasting and Reasoning

Zhensheng Wang, Xiaole Liu, Wenmian Yang, Kun Zhou +2 more

The paper introduces Open-Domain Tabular Question Answering for Future Data Forecasting and Reasoning, a new dataset and framework that enables LLMs to perform time-series forecasting and reasoning on…

View →

cs.LGcs.CRRecentApr 13, 2026

INTARG: Informed Real-Time Adversarial Attack Generation for Time-Series Regression

Gamze Kirman Tokgoz, Onat Gungor, Tajana Rosing, Baris Aksanli

The paper proposes INTARG, an informed and selective adversarial attack framework for time-series forecasting that significantly increases prediction error by targeting only the most vulnerable time s…

View →

cs.CLcs.AIRecentMay 28, 2026

Predicting Causal Effects from Natural Language Queries using Structured Representations

Giuliano Martinelli, Piriyakorn Piriyatamwong, Abelardo Carlos Martinez Lorenzo, Jasmin Baier +6 more

The paper introduces Query2Effect, a large-scale benchmark, and a two-step framework to predict causal effect sizes from natural language queries, showing that structured representation significantly…

View →

cs.LGcs.AIcs.CLRecentMay 28, 2026

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding +3 more

The paper introduces LongDS, a new benchmark for long-horizon, multi-turn data analysis, demonstrating that current AI agents struggle significantly with maintaining and updating complex analytical st…

View →

cs.AIRecentMay 29, 2026

Learning Agent-Compatible Context Management for Long-Horizon Tasks

Lu Yi, Runlin Lei, Liuyi Yao, Yuexiang Xie +5 more

The paper introduces Adaptive Context Management (AdaCoM), an external context manager that uses reinforcement learning to improve the performance of frozen LLM agents on long-horizon tasks by intelli…

View →

cs.LGcs.AIRecentMay 29, 2026

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

Liang He, Jingbo Wen, Qishi Zhan, Yixiong Chen +3 more

BudgetDraft introduces an acceptance-aware multi-view training method that trains a sparse-KV speculative decoder to maintain high acceptance rates across varying context lengths and sparsity levels,…

View →

cs.AIRecentJun 1, 2026

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Jiaming Wang, Ziteng Feng, Jiangtao Wu, Ruihao Li +7 more

The paper introduces TELBench and the DRIFT framework to enable fine-grained, span-level error localization in deep-research agents, significantly improving the ability to pinpoint exactly where an ag…

View →

cs.CLcs.IRRecentMay 29, 2026

Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory

Han Zhang, Zihao Tang, Xin Yu, Xiao Liu +7 more

The paper introduces RHELM, a new benchmark designed to test LLMs' long-term memory by simulating realistic, complex, and evolving dialogues that integrate multiple heterogeneous data sources.

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù +1 more

This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.

View →

cs.IRcs.AIRecentMay 29, 2026

DynaTree: Dynamic Agentic Retrieval Tree for Time-Sensitive News Retrieval

Siyuan Qi, Xinyuan Wang, Yingxuan Yang, Haochuan Guo +4 more

DynaTree introduces a two-stage framework that pre-constructs a reusable retrieval tree offline using coordinated agents, allowing for efficient, structure-aware, and highly effective time-sensitive n…

View →

cs.CRRecentApr 17, 2026

Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

Cedric Bonhomme, Alexandre Dulaunoy

The paper investigates forecasting sparse and bursty vulnerability sightings, concluding that traditional time-series models like SARIMAX are inadequate, and count-based methods like Poisson regressio…

View →

cs.AIcs.LGstat.MERecentMay 29, 2026

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

Grégoire Martinon, Ibrahim Merad, Mohammed Raki

The paper introduces GLIDE, an open-source Python library that unifies multiple state-of-the-art Prediction-Powered Inference (PPI) estimators and samplers to provide reliable, debiased estimates and…

View →

cs.AIRecentMay 27, 2026

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Zhexin Hu, Li Wang, Xiaohan Wang, Jiajun Chai +3 more

ZipRL introduces an adaptive context compression framework that significantly improves the performance and efficiency of LLMs in complex, multi-turn agent tasks by combining multi-granularity compress…

View →