Papers similar to 2606.14672

~ similar to 2606.14672· 20 results

cs.AIRecentMay 27, 2026

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more

The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…

View →

cs.AIRecentMay 28, 2026

Make LLM Learn to Synthesize from Streaming Experiences through Feedback

Zhenlin Hu, Yan Wang, Zhen Bi, Zihao Xue +6 more

The paper introduces StreamSynth, a sequential setting for synthetic data generation, and proposes SynLearner, a framework that enables LLMs to improve synthesis performance by accumulating and transf…

View →

cs.AIRecentMay 31, 2026

ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment

Zhengyang Zhao, Shengjie Ye, Lu Ma, Hao Liang +2 more

The paper introduces Andes, a framework that treats data generation as a plug-and-play agent skill, enabling autonomous alignment of LLMs by providing an intelligent, closed-loop data synthesis interf…

View →

cs.CLRecentJun 1, 2026

Scaling Agentic Capabilities via Grounded Interaction Synthesis

Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao +3 more

The paper introduces Grounded Agentic Interaction Synthesis (GAIS), a framework that generates high-quality, diverse, and complex agentic training data by anchoring tasks to real-world protocols, sign…

View →

cs.AIRecentMay 28, 2026

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more

The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…

View →

cs.AIRecentMay 31, 2026

"Skill issues'': data-centric optimization of lakehouse agents

Nicole Rose Schneider, Davide Ghilardi, Giacomo Piccinini, Jacopo Tagliabue

The paper introduces a data-centric optimization pipeline to improve coding agents' ability to interact with a branching lakehouse, showing significant accuracy gains by treating agent evaluation as a…

View →

cs.AIRecentMay 27, 2026

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more

The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…

View →

cs.CLcs.AIcs.IRRecentMay 28, 2026

Exploring Autonomous Agentic Data Engineering for Model Specialization

Yujie Luo, Xiangyuan Ru, Jingsheng Zheng, Jingjing Wang +9 more

The paper introduces Autonomous Agentic Data Engineering, demonstrating that LLMs can autonomously plan and optimize end-to-end data curation pipelines, leading to substantial performance gains in spe…

View →

cs.MAcs.CLcs.LGRecentJun 1, 2026

Multi-Agent Computer Use

Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried

The paper proposes Multi-Agent Computer Use (MACU) systems, which significantly improve performance on complex, long-horizon tasks by enabling parallel execution and dynamic task decomposition compare…

View →

cs.LGcs.AIRecentMay 29, 2026

Learning to Construct Practical Agentic Systems

Aditya Kumar, Zhihan Lei, Jerry Yan, Joshua W. Momo +5 more

The paper proposes a modular agent framework and novel learning methods to design and optimize practical, cost-effective, and controllable LLM-based agentic systems.

View →

cs.AIRecentMay 27, 2026

OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents

Chenyu Zhou, Xinyun Lu, Jiangyue Zhao, Jianghao Lin +2 more

The paper introduces OR-Space, a novel full-lifecycle workspace benchmark designed to rigorously evaluate industrial optimization agents by simulating real-world, multi-stage OR workflows that go beyo…

View →

cs.AIcs.LGRecentMay 30, 2026

MOSAIC: Modular Orchestration for Structured Agentic Intelligence and Composition

Yifan Bao, Xinyu Xi, Xinyu Liu, Wen Ge +7 more

MOSAIC introduces a structured agentic framework that treats automated data science as a staged, context-grounded model selection problem, improving performance and traceability over traditional AutoM…

View →

cs.AIRecentMay 27, 2026

Beyond Consensus: Trace-Level Synthesis in Mixture of Agents

Shreyas Fadnavis, Praitayini Kanakaraj, Felix Wyss

The paper proposes using an LLM aggregator that analyzes complete reasoning traces, demonstrating that trace-level synthesis is superior to traditional consensus methods like majority voting for solvi…

View →

cs.AIcs.DBRecentMay 27, 2026

A Query Engine for the Agents

Kenny Daniel

The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.

View →

cs.DCcs.AIRecentJun 1, 2026

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

Yafan Huang, Sheng Di, Guanpeng Li

This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…

View →

cs.DCcs.AIcs.LGRecentMay 31, 2026

Leyline: KV Cache Directives for Agentic Inference

Bole Ma, Jan Eitzinger, Harald Koestler

Leyline introduces a novel serving-side primitive that allows agentic LLMs to perform targeted, efficient edits to the KV cache, avoiding costly full re-prefilling after content modification.

View →

cs.CLcs.SERecentMay 29, 2026

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Jiasheng Zheng, Boxi Cao, Boxi Yu, Yuzhong Zhang +5 more

The paper introduces Atomic Decomposition and Recombination (ADR), a novel framework that generates genuinely novel and challenging verifiable code tasks, significantly improving the scalability of Re…

View →

cs.SEcs.AIcs.CLRecentMay 29, 2026

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

Srivatsa Kundurthy, Clara Na, Colton Moraine, Anoushka Mohta +5 more

The paper introduces BlueFin, a challenging benchmark for evaluating LLM agents on complex financial spreadsheet tasks, finding that even frontier models perform poorly, scoring less than 50% on avera…

View →

cs.MAcs.AIRecentMay 28, 2026

Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems

Zhezheng Hao, Tianfu Wang, Huanshuo Dong, Ziyan Liu +6 more

The paper proposes Meta-Team, an experience-driven framework that enables multi-agent systems (MAS) to collaboratively self-evolve by transforming complex execution experiences into reusable improveme…

View →

cs.CLRecentMay 29, 2026

ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents

Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu +7 more

ExpGraph is a model-agnostic framework that uses a self-evolving experience graph to enable LLM agents to reuse past successful strategies and failure lessons, significantly improving performance acro…

View →