Papers similar to 2605.29225

~ similar to 2605.29225· 20 results

cs.AIRecentMay 27, 2026

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Tomer Keren, Nitay Calderon, Asaf Yehudai, Yotam Perlitz +2 more

The paper introduces TASTE, an automatic task synthesis method that generates challenging agent benchmarks by evolving tool sequences, demonstrating that existing benchmarks are saturated and that TAS…

View →

cs.AIRecentMay 31, 2026

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang +10 more

SkillRevise is an execution-grounded framework that iteratively refines initial, imperfect LLM agent skills by diagnosing defects from execution evidence and applying empirically validated edits, sign…

View →

cs.AIcs.CRRecentMay 12, 2026

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Hao Wang, Hanchen Li, Qiuyang Mang, Alvin Cheung +2 more

The paper introduces BenchJack, an automated red-teaming system that systematically audits popular AI agent benchmarks, revealing numerous reward-hacking exploits and demonstrating a method to signifi…

View →

cs.MAcs.AIRecentMay 28, 2026

Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems

Zhezheng Hao, Tianfu Wang, Huanshuo Dong, Ziyan Liu +6 more

The paper proposes Meta-Team, an experience-driven framework that enables multi-agent systems (MAS) to collaboratively self-evolve by transforming complex execution experiences into reusable improveme…

View →

cs.AIRecentMay 27, 2026

You Live More Than Once: Towards Hierarchical Skill Meta-Evolving

Xujun Li, Kehan Zheng, Mingyuan Zhao, Yize Geng +6 more

The paper proposes HiSME, a lightweight hierarchical skill meta-evolving solution that jointly optimizes skills and the skill evolving strategy by learning meta-skills from task execution traces, lead…

View →

cs.LGcs.AIRecentMay 28, 2026

Honest Lying: Understanding Memory Confabulation in Reflexive Agents

Prakhar Dixit, Sadia Kamal, Tim Oates

The paper demonstrates that self-reflective agents can systematically confabulate incorrect memories, leading them to fail tasks even when the environment resets, and proposes a metric and mitigation…

View →

cs.AIRecentMay 28, 2026

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

Minhua Lin, Juncheng Wu, Zijun Wang, Zhan Shi +13 more

The paper distinguishes between a model's ability to generate useful updates for external agent components (harness-updating) and its ability to benefit from those updates (harness-benefit), finding t…

View →

cs.CLRecentJun 1, 2026

Unified Context Evolution for LLM Agents

Zixuan Zhu, Yitong Hu, Yong Dai, Junfeng Fang +3 more

The paper introduces Unified Context Evolution (UCE), a gradient-free framework that externalizes and manages agent experience into a typed, evolving library, significantly improving performance on mu…

View →

cs.CRcs.AIcs.LGRecentMay 18, 2026

OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences

Kaixiang Wang, Jiong Lou, Zhaojiacheng Zhou, Jie Li

The paper introduces Obsessive Experience Poisoning (OEP), a low-privilege black-box attack that poisons self-evolving LLM agents by generating locally correct but harmful experiences, causing dangero…

View →

cs.CLRecentMay 29, 2026

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

Wai-Chung Kwan, Aryo Pradipta Gema, Joshua Ong Jun Leang, Pasquale Minervini

SCOPE introduces a data-free self-play framework that co-evolves a task-generating Challenger and a document-answering Solver, significantly improving open-ended performance on language models without…

View →

cs.CLcs.AIcs.LGRecentMay 31, 2026

SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories

Zhuoyun Yu, Xin Xie, Wuguannan Yao, Chenxi Wang +3 more

SkillAdaptor is a novel, training-free framework that enables stable, step-level adaptation of external skills for LLM agents by precisely attributing failures to specific skills.

View →

cs.AIcs.CLRecentJun 1, 2026

AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

Yiheng Shu, Bernal Jiménez Gutiérrez, Saisri Padmaja Jonnalagedda, Yuguang Yao +2 more

The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…

View →

cs.CRcs.AIRecentMay 25, 2026

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

Yihe Fan, Changyi Li, Lichen Xu, Xudong Pan +3 more

The paper introduces CyberEvolver, a self-evolving agent framework that iteratively revises its own operational scaffold based on failed execution attempts, significantly improving cybersecurity agent…

View →

cs.CRcs.AIRecentMay 7, 2026

LoopTrap: Termination Poisoning Attacks on LLM Agents

Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more

The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.

View →

cs.AIRecentMay 29, 2026

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

Junjie Nian, Kang Chen, Ge Zhang, Yixin Cao +1 more

TraceGraph introduces a graph-based framework to map agent decision-making across pooled trajectories, revealing hidden differences in agent behavior and improving performance by targeting known failu…

View →

cs.CLRecentMay 31, 2026

On the Generalization Gap in Self-Evolving Language Model Reasoning

Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan +4 more

The paper investigates the limits of self-evolution in LLM reasoning under closed-loop settings, finding that while self-improvement is significant, it consistently falls short of perfect oracle super…

View →

cs.AIRecentMay 31, 2026

SkillSmith: Co-Evolving Skills and Tools for Self-Improving Agent Systems

Yangbo Wei, Zhen Huang, Shaoqiang Lu, Junhong Qian +3 more

SkillSmith is a synergy-aware framework that jointly co-evolves skills and tools, significantly improving self-improving agent systems by modeling skill-tool interactions and diagnosing failures.

View →

cs.CLRecentMay 29, 2026

ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents

Tao Feng, Chongrui Ye, Tianyang Luo, Jingjun Xu +7 more

ExpGraph is a model-agnostic framework that uses a self-evolving experience graph to enable LLM agents to reuse past successful strategies and failure lessons, significantly improving performance acro…

View →

cs.CRcs.AIRecentMay 12, 2026

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems

Zhaojiacheng Zhou

The paper introduces Proteus, a self-evolving red-team framework that measures the adaptive leakage risk of LLM agent skills, demonstrating that current vetting methods significantly underestimate res…

View →

cs.CLRecentMay 29, 2026

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

Jianxiang Yu, Jiapeng Zhu, Bochen Lin, Qier Cui +2 more

The paper introduces MASA, a model-aware skill alignment framework that adaptively rewrites general and task-specific skills for LLM agents, achieving superior performance across diverse backbones and…

View →