~ similar to 2605.25002v2· 20 results
MemLineage introduces a novel, cryptographically-backed defense mechanism that enforces a chain-of-custody for LLM agent memory, preventing untrusted or poisoned state from justifying sensitive action…
SeqWM introduces a sequential behavioral watermarking framework that embeds ownership signals into history-conditioned transition patterns of LLM agent actions, providing robust and position-agnostic…
Xuanye Zhang, Yongsen Zheng, Zhuqin Xu, Kaiyu Zhou +4 more
MemMorph introduces a novel memory poisoning attack that biases LLM agent tool selection by injecting crafted records into the agent's long-term memory, achieving high success rates even against moder…
The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…
This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…
The paper proposes a novel binomial multibit LLM watermarking scheme that encodes every bit of a payload at every token position, achieving superior message accuracy and robustness compared to existin…
Shuhao Zhang, Yuli Chen, Jiale Han, Bo Cheng +1 more
The paper proposes Adaptive Stealing (AS), a novel and more robust watermark stealing algorithm that dynamically selects optimal attack perspectives to significantly increase the efficiency of comprom…
TimeMark proposes a trustworthy time watermarking framework that uses cryptographic techniques and error-correcting codes to achieve 100% accurate recovery of the generation time from AIGC, resisting…
Debeshee Das, Julien Piet, Darya Kaviani, Luca Beurer-Kellner +2 more
The paper introduces Trojan Hippo, a persistent memory attack that exfiltrates sensitive data from LLM agents by planting dormant payloads into long-term memory, and develops a comprehensive framework…
The paper argues that watermarking must be viewed as a monitoring primitive, introducing an observer-based threat model that shows even zero-bit watermarking can enable entity-level attribution throug…
The paper introduces memorywire, a vendor-neutral JSON-Schema wire format and reference implementation designed to standardize and govern memory operations across disparate agent-memory frameworks.
The paper introduces memorywire, a vendor-neutral JSON-Schema 2020-12 wire format and reference implementation to standardize and govern agent memory operations across diverse, proprietary agent-memor…
The paper introduces Portable Agent Memory, an open protocol designed to allow persistent, cryptographically-verified memory state to be reliably transferred between diverse and heterogeneous AI agent…
Yan Liang, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou +1 more
The paper proposes SubPopMark, a novel subpopulation-driven framework that injects harmless, verifiable markers into distilled datasets to prevent copyright infringement and data leakage.
The paper proposes SAGE, a novelty-aware gate that efficiently controls memory updates in agentic LLMs by classifying new facts as clearly novel, clearly redundant, or uncertain, thereby significantly…
Zhihao Wu, Gracia Gong, Qinglin Zhu, Yudong Chen +1 more
The paper demonstrates that combining outputs from multiple large language models (LLMs) effectively cancels out statistical watermarks, revealing a fundamental vulnerability in current AI text detect…
Xinle Deng, Ruobin Zhong, Hujin Peng, Xiaoben Lu +14 more
The paper introduces MemTrace, a framework that treats LLM memory pipelines as traceable graphs to systematically diagnose and automatically correct memory-related errors, boosting performance by up t…
The paper proposes reframing mechanistic anomaly detection (MAD) as a functional attribution problem, using influence functions to measure how much a model's output depends on specific input samples,…
Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah +1 more
This paper systematically studies memory poisoning attacks in LLM agents, identifying multiple vulnerabilities and proposing a new benchmark to assess the risk.
Yaopeng Wang, Qingliang Wang, Zhibo Wang, Huiyu Xu +4 more
LoRA-Key introduces a user-centric watermarking framework that attaches a recoverable ownership key to LoRA modules via a standalone Watermark LoRA, providing lightweight, plug-and-play copyright prot…