Papers similar to 2606.01533

~ similar to 2606.01533· 20 results

cs.AIcs.CLcs.CRRecentApr 9, 2026

ACIArena: Toward Unified Evaluation for Agent Cascading Injection

Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu +5 more

The paper introduces ACIArena, a unified and comprehensive evaluation framework designed to systematically test the robustness of Multi-Agent Systems against complex Agent Cascading Injection attacks.

View →

cs.AIRecentMay 29, 2026

Learning Agent-Compatible Context Management for Long-Horizon Tasks

Lu Yi, Runlin Lei, Liuyi Yao, Yuexiang Xie +5 more

The paper introduces Adaptive Context Management (AdaCoM), an external context manager that uses reinforcement learning to improve the performance of frozen LLM agents on long-horizon tasks by intelli…

View →

cs.MAcs.AIRecentMay 28, 2026

Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems

Zhezheng Hao, Tianfu Wang, Huanshuo Dong, Ziyan Liu +6 more

The paper proposes Meta-Team, an experience-driven framework that enables multi-agent systems (MAS) to collaboratively self-evolve by transforming complex execution experiences into reusable improveme…

View →

cs.AIRecentMay 30, 2026

CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems

Yannan Wang, Longli Yang, Zhen Liu, Abhishek Kumar +1 more

CoMIC is a cloud-edge framework that enables resource-constrained LLM agents to successfully complete complex, long-horizon tasks by collaboratively sharing and refining memory and insights between lo…

View →

cs.AIRecentMay 29, 2026

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Hai Lin

The paper proposes the Intelligent Computing Architecture Model (ICAM), a six-layer framework that unifies disparate concepts in model-native computing by viewing the LLM stack through a dual-plane ar…

View →

cs.AIRecentMay 31, 2026

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

Xuancheng Zhu, Yang Yue, Shuaibing Wan, Zihan Dou +3 more

The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating t…

View →

cs.AIRecentMay 27, 2026

AsyncTool: Evaluating the Asynchronous Function Calling Capability under Multi-Task Scenarios

Kou Shi, Ziao Zhang, Shiting Huang, Avery Nie +6 more

The paper introduces AsyncTool, a new benchmark designed to evaluate LLM agents' ability to handle multiple, concurrent tasks with delayed tool feedback, demonstrating that asynchronous coordination i…

View →

cs.CLcs.AIcs.LGRecentMay 28, 2026

Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents

Alejandra Zambrano, Sara Vera Marjanovic, Imene Kerboua, Xing Han Lù +1 more

This paper empirically demonstrates that the choice of plan representation (e.g., checklist vs. narrative) significantly impacts the robustness and success rate of LLM-based web agents.

View →

cs.CLRecentJun 1, 2026

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Nahyun Lee, Dongkeun Yoon, Guijin Son, Geewook Kim +11 more

The paper introduces K-BrowseComp, a new web-browsing agent benchmark of 400 problems grounded in Korean contexts, demonstrating that current frontier LLMs struggle significantly with complex, context…

View →

cs.MAcs.AIcs.CYRecentMay 30, 2026

Scaling Behavior of Single LLM-Driven Multi-Agent Systems

Jialing Li, Zhouhong Gu, Yin Cai, Hongwei Feng

This paper investigates the scaling behavior of homogeneous LLM-driven Multi-Agent Systems (MAS) and finds that performance exhibits diminishing returns due to coordination overhead, rather than scali…

View →

cs.MAcs.CRcs.LGRecentApr 25, 2026

Architecture Matters for Multi-Agent Security

Ben Hagag, William L. Anderson, Christian Schroeder de Witt, Sarah Scheffler

This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…

View →

cs.AIRecentMay 28, 2026

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance

Hongxiang Zhang, Yuan Tian, Tianyi Zhang

The paper introduces Agent-Radar, a training-free method that dynamically steers multi-agent attention toward relevant context using a novel decay mechanism, significantly improving performance in lon…

View →

cs.AIRecentMay 31, 2026

"Skill issues'': data-centric optimization of lakehouse agents

Nicole Rose Schneider, Davide Ghilardi, Giacomo Piccinini, Jacopo Tagliabue

The paper introduces a data-centric optimization pipeline to improve coding agents' ability to interact with a branching lakehouse, showing significant accuracy gains by treating agent evaluation as a…

View →

cs.OScs.AIcs.CRRecentJun 2, 2026

Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

Yingqi Zhang

Agent libOS introduces a library-OS-inspired runtime substrate that treats LLM agents as schedulable processes, providing explicit capability control and robust auditing for long-running, stateful age…

View →

cs.CLcs.AIcs.MARecentJun 3, 2026

Streaming Communication in Multi-Agent Reasoning

Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen +2 more

The paper introduces StreamMA, a streaming multi-agent reasoning system that significantly reduces latency and improves effectiveness by passing reasoning steps to downstream agents as they are genera…

View →

cs.CRcs.AIcs.CLRecentMay 14, 2026

Web Agents Should Adopt the Plan-Then-Execute Paradigm

Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu +4 more

The paper argues that web agents should abandon the reactive ReAct paradigm in favor of a plan-then-execute approach, which requires developing typed, task-level APIs to properly structure web interac…

View →

cs.LGcs.AIcs.CLRecentMay 28, 2026

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

Kewei Xu, Xiaoben Lu, Shuofei Qiao, Zihan Ding +3 more

The paper introduces LongDS, a new benchmark for long-horizon, multi-turn data analysis, demonstrating that current AI agents struggle significantly with maintaining and updating complex analytical st…

View →

cs.AIcs.CLRecentMay 28, 2026

GTA: Generating Long-Horizon Tasks for Web Agents at Scale

Tenghao Huang, Kung-Hsiang Huang, Prafulla Kumar Choubey, Yilun Zhou +3 more

The paper introduces GTA, a scalable framework for generating realistic, multi-hop web-agent tasks with dense, executable trajectories, addressing the current lack of process-level supervision in web…

View →

cs.CLRecentJun 1, 2026

Scaling Agentic Capabilities via Grounded Interaction Synthesis

Wenhang Shi, Jinhao Dong, Yiren Chen, Zhe Zhao +3 more

The paper introduces Grounded Agentic Interaction Synthesis (GAIS), a framework that generates high-quality, diverse, and complex agentic training data by anchoring tasks to real-world protocols, sign…

View →

cs.AIRecentMay 27, 2026

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more

The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…

View →