~ similar to 2604.00430v1· 20 results
This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…
The paper proposes a novel bi-level exact unlearning attack targeting Large Reasoning Models (LRMs) that forces incorrect final answers while generating misleading reasoning traces, highlighting new s…
Jie Fu, Nima Naderloui, Da Zhong, Yuan Hong +1 more
This paper introduces TC-UMIA, a novel tri-class membership inference attack, demonstrating that machine unlearning can leak privacy risks to the retained data set, and evaluates defense mechanisms to…
PURGE is a novel machine unlearning algorithm that leverages the duality between continual learning and unlearning to achieve high data retention while making the unlearned model indistinguishable fro…
Xingyu Lyu, Jianfeng He, Ning Wang, Yidan Hu +4 more
The paper proposes ADAM, a novel and highly effective privacy attack that systematically extracts sensitive data from LLM agent memory by adaptively querying the victim agent's memory based on data di…
This paper proposes a modified SISA framework to achieve efficient class-level unlearning in CNNs, allowing the removal of specific data influence without full model retraining.
Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah +1 more
This paper systematically studies memory poisoning attacks in LLM agents, identifying multiple vulnerabilities and proposing a new benchmark to assess the risk.
Ziyan Liu, Zhezheng Hao, Yeqiu Chen, Hong Wang +6 more
The paper introduces Metacognitive Memory Policy Optimization (MMPO), a novel memory training approach that optimizes LLM memory not based on final task success, but on minimizing epistemic uncertaint…
This paper analyzes memory poisoning attacks targeting multi-agent systems (MAS) powered by LLMs, proposing mitigation strategies across various memory types, especially focusing on secure design prin…
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more
This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…
The paper introduces AGENTCL, a rigorous evaluation framework that uses controlled task streams to accurately measure an agent's ability to accumulate and reuse knowledge across multiple tasks, thereb…
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
The paper proposes MemPoison, a novel memory poisoning attack that injects triggerable backdoors into LLM agents' long-term memory through dialogue interactions, achieving high success rates by bypass…
The paper introduces MemPoison, a novel memory poisoning attack that successfully injects triggerable backdoors into LLM agents' long-term memory through conversational interactions, achieving high at…
Weidong Zheng, Kongyang Chen, Yao Huang, Yuanwei Guo +1 more
This paper analyzes and proposes four novel attack methods—based on model parameters and model inversion—to demonstrate that existing machine unlearning techniques can inadvertently leak the categorie…
This paper introduces the first complete pipeline for federated unlearning, proposing an efficient unlearning approach and a novel visualization framework (Skyeye) to evaluate a model's forgetting cap…
Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang +1 more
This survey provides a comprehensive, structured taxonomy of split learning techniques for fine-tuning Large Language Models (LLMs), covering model optimization, system efficiency, and privacy preserv…
Xinyang Lu, Jiabao Pan, Rachael Hwee Ling Sim, See-Kiong Ng +2 more
The paper proposes DareU, a novel LLM unlearning framework that optimizes unlearning by zeroing out data attribution scores instead of maximizing prediction loss, achieving effective unlearning while…
Mengying Zhang, Derui Wang, Ruoxi Sun, Xiaoyu Xia +2 more
This paper provides the first integrated analysis of model dememorization, unifying unlearnability and unlearning methods, and offering theoretical guarantees on dememorization depth.