Agents

Autonomous agents, planning, tool use, and agentic workflows

20 papers indexed

cs.NIcs.AISurveyRecentJul 17, 2026

LLM-Powered Agentic AI for 5G/6G Networks: A Tutorial and Survey on Architectures, Protocols, and Standardization

Mazene Ameur, Abdelkader Mekrache, Bouziane Brik, Adlen Ksentini

This paper presents a tutorial-and-survey on integrating agentic AI into Next-Generation Networks (NGNs), addressing the gap in protocol integration, evaluation, and standardization alignment.

View →

cs.AIRecentMay 27, 2026

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Zhenyu Cui, Xiangzhong Luo

The paper investigates how LLMs allocate their internal computational depth during multi-turn agentic planning, finding that agents progressively recruit deeper layers and shift toward corrective upda…

View →

cs.AIcs.MAeess.SYSurveyRecentJul 20, 2026

Engineering Trustworthy Agentic AI for Critical Systems

Omar Al-Refai, Ibrahim Shahbaz, Adam Ali Husseinat, Michael Mandulak +2 more

This paper surveys the trustworthiness of agentic AI systems in engineering domains, organizing it around five dimensions: safety, robustness, transparency, accountability, and privacy.

View →

cs.CRcs.MMRecentMay 26, 2026

AgenticVBench: Can AI Agents Complete Real-World Post-Production Tasks?

Zongheng Cao, Yi Zheng, Rui Song, Xinyu Hu

The paper introduces AgenticVBench, a comprehensive benchmark of 100 real-world video post-production tasks, and finds that even the best AI agents perform significantly worse than human experts on th…

View →

cs.AIRecentMay 30, 2026

Acting with AI: An Interaction-Based Framework for Agentic Tort Liability

Yiheng Yao

The paper proposes an interaction-based legal framework for assigning tort liability when autonomous AI systems cause harm, categorizing liability based on the nature of the human-AI interaction.

View →

cs.CLcs.AIRecentMay 28, 2026

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

Chenghao Zhang, Guanting Dong, Yufan Liu, Tong Zhao +1 more

The paper introduces extsc{Ptah}, a multi-agent harness designed to improve verifiable multimodal deep research by orchestrating the entire report generation process, ensuring factual grounding and v…

View →

cs.AIcs.CLcs.CRRecentMay 17, 2026

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more

This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.

View →

cs.NIcs.AIcs.CRRecentMay 12, 2026

Large Language Models for Agentic NetOps and AIOps: Architectures, Evaluation, and Safety

Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more

The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…

View →

cs.CLRecentMay 30, 2026

Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations

Adril Putra Merin, David Anugraha, Ayu Purwarianti, Genta Indra Winata

The paper introduces Momento, a new benchmark that evaluates agentic AI's ability to maintain state and reason across multiple, disconnected sessions, revealing that current agents struggle with integ…

View →

cs.AIcs.LGRecentJun 1, 2026

Characterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven Simulation

Donghwan Kim, Prakhar Singh, Younghoon Min, Jongryool Kim +2 more

The paper introduces GAIATrace, a comprehensive token-level dataset, and Vidur-Agent, a simulator, to enable reproducible and detailed system-level characterization of complex multi-model agentic AI s…

View →

cs.CRcs.AIRecentMay 22, 2026

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents

Shi Liu, Xuehai Tang, Xikang Yang, Liang Lin +3 more

This paper introduces a new benchmark to test Tool Description Poisoning (TDP) attacks on LLM agents, demonstrating that even advanced models like GPT-4o are highly vulnerable and that current defense…

View →

cs.AIRecentMay 27, 2026

VeriTrip: A Verifiable Benchmark for Travel Planning Agents over Unstructured Web Corpora

Yuting Xu, Jiayi Tian, Jian Liang, Xin Xiong +3 more

The paper introduces VeriTrip, a new verifiable benchmark that evaluates travel planning agents' ability to perform evidence-grounded reasoning over complex, unstructured, and multimodal web data, rev…

View →

cs.SEcs.LGEmpiricalRecentJul 9, 2026

TTHE: Test-Time Harness Evolution

Jun Nie, Yonggang Zhang, Jun Song, Qianshu Cai +4 more

This paper introduces Test-Time Harness Evolution (TTHE), a method for optimizing LLM agent harnesses during evaluation using execution traces, without requiring gold labels or training.

View →

cs.MAcs.CRcs.LGRecentApr 25, 2026

Architecture Matters for Multi-Agent Security

Ben Hagag, William L. Anderson, Christian Schroeder de Witt, Sarah Scheffler

This paper empirically demonstrates that the architectural design of multi-agent systems significantly impacts their security, finding that coordination mechanisms can introduce vulnerabilities greate…

View →

cs.AREmpiricalRecentJul 3, 2026

ArchEval: Measuring AI Agents as Computer Architects

Chenyu Wang, Zishen Wan, Jeffrey Ma, Shvetank Prakash +7 more

This paper introduces ArchEval, a benchmark and platform for evaluating LLM agents on computer architecture design and optimization.

View →

cs.IRcs.AIRecentMay 29, 2026

DynaTree: Dynamic Agentic Retrieval Tree for Time-Sensitive News Retrieval

Siyuan Qi, Xinyuan Wang, Yingxuan Yang, Haochuan Guo +4 more

DynaTree introduces a two-stage framework that pre-constructs a reusable retrieval tree offline using coordinated agents, allowing for efficient, structure-aware, and highly effective time-sensitive n…

View →

cs.SDEmpiricalRecentJul 23, 2026

SoundscapeAgent: Agentic Soundscape Construction for Controllable Synthesis and Scalable Audio-Language Supervision

Hao Zhang, Yiwen Zhao, Yixuan Zhang, Yiwen Shao +1 more

The paper introduces an agentic soundscape construction framework for controllable compositional audio generation, which makes explicit the scene planning, source selection, temporal layout, and rende…

View →

cs.CRcs.AIRecentApr 27, 2026

SUDP: Secret-Use Delegation Protocol for Agentic Systems

Xiaohang Yu, Hejia Geng, Xinmeng Zeng, William Knottenbelt

The paper proposes the Secret-Use Delegation Protocol (SUDP) to solve the Agent Secret Use (ASU) problem, ensuring that autonomous agents can perform user-authorized operations without gaining reusabl…

View →

cs.CRcs.SERecentMay 5, 2026

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more

The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…

View →

cs.CRcs.AIcs.MARecentMar 23, 2026

STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving

James Hugglestone, Samuel Jacob Chacko, Dawson Stoller, Ryan Schmidt +1 more

The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…

View →