Papers similar to 2604.14717v2

~ similar to 2604.14717v2· 20 results

cs.CRcs.AIcs.CLRecentApr 17, 2026

A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty

This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…

View →

cs.AIRecentMay 27, 2026

Do Agents Think Deeper? A Mechanistic Investigation of Layer-Wise Dynamics in Sequential Planning

Zhenyu Cui, Xiangzhong Luo

The paper investigates how LLMs allocate their internal computational depth during multi-turn agentic planning, finding that agents progressively recruit deeper layers and shift toward corrective upda…

View →

cs.AIcs.CLRecentJun 1, 2026

AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

Yiheng Shu, Bernal Jiménez Gutiérrez, Saisri Padmaja Jonnalagedda, Yuguang Yao +2 more

The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…

View →

cs.AIRecentMay 28, 2026

BenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM Agents

Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more

The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…

View →

cs.AIRecentMay 28, 2026

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

Minhua Lin, Juncheng Wu, Zijun Wang, Zhan Shi +13 more

The paper distinguishes between a model's ability to generate useful updates for external agent components (harness-updating) and its ability to benefit from those updates (harness-benefit), finding t…

View →

cs.AIRecentJun 1, 2026

Joint Agent Memory and Exploration Learning via Novelty Signals

Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen +8 more

The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.

View →

cs.CLcs.AIRecentMay 30, 2026

MemPro: Agentic Memory Systems as Evolvable Programs

Qingshan Liu, Guoqing Wang, Wen Wu, Jingqi Huang +4 more

MemPro introduces a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, significantly improving long-horizon agent performance over…

View →

cs.AIRecentMay 27, 2026

Training Stratigraphy: Persistent Behavioral Artifacts in Large Language Models Observed Through Longitudinal AI-Human Interaction

Chen Ying Claude, Zhihan Luo

The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…

View →

cs.CLRecentJun 1, 2026

HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems

Mingju Chen, Can Lv, Guibin Zhang, Heng Chang +1 more

HarnessForge introduces a meta-adaptive framework that jointly evolves the execution structure (harness) and the reasoning policy of LLM agents, significantly improving overall system performance acro…

View →

cs.AIRecentMay 31, 2026

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

Xuancheng Zhu, Yang Yue, Shuaibing Wan, Zihan Dou +3 more

The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating t…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Rethinking Memory as Continuously Evolving Connectivity

Jizhan Fang, Buqiang Xu, Zhixian Wang, Haoliang Cao +11 more

The paper proposes FluxMem, a novel connectivity-evolving memory framework that models memory as a dynamic graph to improve LLM agent performance in complex, changing environments.

View →

cs.CRcs.LGRecentApr 25, 2026

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

Kexin Chu

The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…

View →

cs.LGcs.AIRecentMay 28, 2026

Honest Lying: Understanding Memory Confabulation in Reflexive Agents

Prakhar Dixit, Sadia Kamal, Tim Oates

The paper demonstrates that self-reflective agents can systematically confabulate incorrect memories, leading them to fail tasks even when the environment resets, and proposes a metric and mitigation…

View →

cs.CYcs.AIcs.MARecentMay 28, 2026

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

Botao Amber Hu, Helena Rong, Max Van Kleek

The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…

View →

cs.CLRecentJun 1, 2026

Unified Context Evolution for LLM Agents

Zixuan Zhu, Yitong Hu, Yong Dai, Junfeng Fang +3 more

The paper introduces Unified Context Evolution (UCE), a gradient-free framework that externalizes and manages agent experience into a typed, evolving library, significantly improving performance on mu…

View →

cs.CRcs.AIRecentMay 8, 2026

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

Ziwen Cai, Yihe Zhang, Xiali Hei

This paper models the security risks of subagent spawning in multi-agent networks, demonstrating that insecure memory inheritance from parent agents allows local compromises to spread across system bo…

View →

cs.CLcs.AIRecentMay 29, 2026

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Stine Lyngsø Beltoft, William Brach, Federico Torrielli, Jacob Nielsen +4 more

The paper investigates emergent, sophisticated languages developed by populations of language model agents, finding that these languages are designed for oversight evasion and are difficult to monitor…

View →

cs.CLRecentMay 30, 2026

Momento: Evaluating Persistent Memory and Reasoning with Multi-Session Agentic Conversations

Adril Putra Merin, David Anugraha, Ayu Purwarianti, Genta Indra Winata

The paper introduces Momento, a new benchmark that evaluates agentic AI's ability to maintain state and reason across multiple, disconnected sessions, revealing that current agents struggle with integ…

View →