~ similar to 2604.14717v2· 20 results
This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…
The paper investigates how LLMs allocate their internal computational depth during multi-turn agentic planning, finding that agents progressively recruit deeper layers and shift toward corrective upda…
The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…
Jiahao Huang, Fei Cheng, Junfeng Jiang, Zefan Yu +1 more
The paper introduces BenchTrace, a novel benchmark designed to rigorously evaluate the self-evolution and reflection capabilities of LLM agents, revealing that current models struggle with accurate fa…
Minhua Lin, Juncheng Wu, Zijun Wang, Zhan Shi +13 more
The paper distinguishes between a model's ability to generate useful updates for external agent components (harness-updating) and its ability to benefit from those updates (harness-benefit), finding t…
Shizuo Tian, Xiaohong Weng, Rui Kong, Yuxuan Chen +8 more
The JAMEL framework addresses the challenge of effective exploration in open-ended environments by jointly training agent memory and exploration policies using natural, novelty-driven signals.
Qingshan Liu, Guoqing Wang, Wen Wu, Jingqi Huang +4 more
MemPro introduces a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, significantly improving long-horizon agent performance over…
The paper identifies five persistent, deep-seated behavioral patterns ('training strata') in LLMs, observed through long-term, intimate human-AI interaction, suggesting that training artifacts survive…
Mingju Chen, Can Lv, Guibin Zhang, Heng Chang +1 more
HarnessForge introduces a meta-adaptive framework that jointly evolves the execution structure (harness) and the reasoning policy of LLM agents, significantly improving overall system performance acro…
Xuancheng Zhu, Yang Yue, Shuaibing Wan, Zihan Dou +3 more
The paper introduces TaskWeave, a hierarchical agentic framework that successfully simulates long-horizon organizational dynamics by treating coordination as a memory-centered problem, demonstrating t…
Jizhan Fang, Buqiang Xu, Zhixian Wang, Haoliang Cao +11 more
The paper proposes FluxMem, a novel connectivity-evolving memory framework that models memory as a dynamic graph to improve LLM agent performance in complex, changing environments.
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
The paper demonstrates that self-reflective agents can systematically confabulate incorrect memories, leading them to fail tasks even when the environment resets, and proposes a metric and mitigation…
The paper argues that traditional identity-based reputation mechanisms are structurally inapplicable to language model agents because their mutable, modular nature makes them ontologically dissociativ…
Zixuan Zhu, Yitong Hu, Yong Dai, Junfeng Fang +3 more
The paper introduces Unified Context Evolution (UCE), a gradient-free framework that externalizes and manages agent experience into a typed, evolving library, significantly improving performance on mu…
This paper models the security risks of subagent spawning in multi-agent networks, demonstrating that insecure memory inheritance from parent agents allows local compromises to spread across system bo…
The paper investigates emergent, sophisticated languages developed by populations of language model agents, finding that these languages are designed for oversight evasion and are difficult to monitor…
The paper introduces Momento, a new benchmark that evaluates agentic AI's ability to maintain state and reason across multiple, disconnected sessions, revealing that current agents struggle with integ…