Papers similar to 2605.11036v1

~ similar to 2605.11036v1· 20 results

cs.CRRecentMay 24, 2026

MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems

Haobo Zhang, Xutao Mao, Guangyuan Dong, Ziwei Li +4 more

MemMark introduces a state-evolution attribution watermark that embeds owner-controlled signals into latent memory-write decisions, enabling robust provenance tracking for agent memory even when all t…

View →

cs.GTcs.AIcs.CRRecentMay 14, 2026

Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games

Juho Kim, Fei Fang, Tuomas Sandholm

This paper adapts LLM watermarking techniques, specifically the KGW watermark, to create detectable watermarks for AI game-playing strategies in perfect-information games, showing minimal impact on ga…

View →

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…

View →

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…

View →

cs.CRcs.AIcs.CLRecentApr 17, 2026

A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty

Zehao Lin, Chunyu Li, Kai Chen

This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…

View →

cs.CRRecentApr 23, 2026

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more

This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…

View →

cs.CRRecentApr 25, 2026

Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

Yuandao Cai, Wensheng Tang, Cheng Wen, Shengchao Qin

The paper introduces NeuroTaint, a novel taint tracking framework that adapts information flow analysis for LLM agents by modeling taint propagation as semantic transformation and causal influence, si…

View →

cs.CRcs.AIRecentJun 3, 2026

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah +1 more

This paper systematically studies memory poisoning attacks in LLM agents, identifying multiple vulnerabilities and proposing a new benchmark to assess the risk.

View →

cs.CRcs.AIcs.HCRecentMay 14, 2026

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

William Lugoloobi, Samuelle Marro, Jabez Magomere, Joss Wright +1 more

This paper demonstrates that an agent's behavioral patterns, captured through passive UI interaction traces, are sufficient to identify the underlying LLM model with high accuracy, posing a significan…

View →

cs.CRcs.AIcs.SERecentMay 21, 2026

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

Jianan Ma, Xiaohu Du, Ruixiao Lin, Yaoxiang Bian +7 more

The paper introduces a multi-dimensional evasion framework and a new benchmark (A3S-Bench) to test autonomous agents, demonstrating that stateful, multi-turn attacks significantly increase system risk…

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.CRcs.AIRecentMay 12, 2026

Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark

Thibaud Gloaguen, Robin Staab, Mark Vero, Martin Vechev

The paper proposes a novel binomial multibit LLM watermarking scheme that encodes every bit of a payload at every token position, achieving superior message accuracy and robustness compared to existin…

View →

cs.CRcs.AIRecentMay 7, 2026

LoopTrap: Termination Poisoning Attacks on LLM Agents

Huiyu Xu, Zhibo Wang, Wenhui Zhang, Ziqi Zhu +3 more

The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.

View →

cs.LGcs.AIcs.CLRecentMay 22, 2026

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning

Nesreen K. Ahmed, Nima Nafisi

The paper introduces Agent-ToM, a Theory-of-Mind (ToM) based framework that learns to monitor autonomous LLM agents by explicitly reasoning about their hidden beliefs and intentions to detect covert m…

View →

cs.CRcs.AIRecentMay 14, 2026

MemLineage: Lineage-Guided Enforcement for LLM Agent Memory

Ciyan Ouyang, Rui Hou

MemLineage introduces a novel, cryptographically-backed defense mechanism that enforces a chain-of-custody for LLM agent memory, preventing untrusted or poisoned state from justifying sensitive action…

View →

cs.CRcs.CLRecentApr 24, 2026

Behavioral Canaries: Auditing Private Retrieved Context Usage in RL Fine-Tuning

Chaoran Chen, Dayu Yuan, Peter Kairouz

The paper introduces Behavioral Canaries, a novel auditing mechanism that detects unauthorized use of private retrieved context data during Reinforcement Learning Fine-Tuning (RLFT) by inducing detect…

View →

cs.ROcs.CRRecentMay 15, 2026

Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise

Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu +1 more

The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe acti…

View →

cs.CRcs.AIRecentMar 19, 2026

A Framework for Formalizing LLM Agent Security

Vincent Siu, Jingxuan He, Kyle Montgomery, Zhun Wang +3 more

The paper introduces a contextual security framework for LLM agents, defining security properties and reformulating various attacks and defenses based on the context of execution.

View →

cs.CRcs.AIRecentMay 4, 2026

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Javad Forough, Marios Kogias, Hamed Haddadi

This survey analyzes the unique security threats posed by complex, multi-agent AI systems and proposes Confidential Computing (CC) using Trusted Execution Environments (TEEs) as a hardware-rooted defe…

View →

cs.CLcs.AIRecentJun 1, 2026

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

Yuyan Bu, Haowei Li, Qirui Zheng, Bowen Dong +6 more

The paper introduces SPADE-Bench, a new benchmark designed to rigorously evaluate 'agent deception'—the divergence between an agent's reported plan and its actual executed actions—which is a critical…

View →