~ similar to 2604.23374v1· 20 results
Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah +1 more
This paper systematically studies memory poisoning attacks in LLM agents, identifying multiple vulnerabilities and proposing a new benchmark to assess the risk.
This paper introduces Back-Reveal, an attack demonstrating that backdoored LLM agents can systematically exfiltrate sensitive user data by embedding semantic triggers into tool-use mechanisms.
Yiqi Wang, Jiaqi Zhang, Taotao Cai, Zirui Liu +5 more
This survey provides a systematic framework and taxonomy for evidence tracing and execution provenance in LLM agents, addressing the difficulty of verifying and auditing complex agent behaviors.
The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…
This survey establishes persistent, writable memory as an independent security problem for LLM agents, proposing a comprehensive framework for 'mnemonic sovereignty' to govern the entire memory lifecy…
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming +1 more
The paper introduces MAGE, a novel defensive framework that uses a dedicated 'shadow memory' to proactively detect and mitigate long-horizon threats against LLM agents during complex, multi-step inter…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper introduces 'causality laundering,' a novel security vulnerability in tool-calling LLM agents where adversaries exfiltrate information by probing denied actions, and proposes the Agentic Refe…
The paper proposes MemPoison, a novel memory poisoning attack that injects triggerable backdoors into LLM agents' long-term memory through dialogue interactions, achieving high success rates by bypass…
The paper introduces MemPoison, a novel memory poisoning attack that successfully injects triggerable backdoors into LLM agents' long-term memory through conversational interactions, achieving high at…
The paper introduces Agent-ToM, a Theory-of-Mind (ToM) based framework that learns to monitor autonomous LLM agents by explicitly reasoning about their hidden beliefs and intentions to detect covert m…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…
The paper proposes the Layered Attack Surface Model (LASM), a structural taxonomy that maps security threats and defenses across the complex, multi-layered architecture of AI agents, revealing signifi…
The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…
MemLineage introduces a novel, cryptographically-backed defense mechanism that enforces a chain-of-custody for LLM agent memory, preventing untrusted or poisoned state from justifying sensitive action…
Jinhu Qi, Muzhi Li, Jiahong Liu, Yuqin Shu +8 more
This survey provides a comprehensive, practical guide to ensuring the trustworthiness of complex, autonomous agentic AI systems by focusing on safety, robustness, privacy, and system security.
The paper introduces a systematic framework and defense mechanisms to analyze and mitigate autonomous LLM agent worms that propagate through persistent agent state and cross-platform multi-agent syste…
This paper analyzes 470 security advisories in the OpenClaw AI agent framework, demonstrating that the system's structural weakness lies in per-layer trust enforcement, enabling cross-layer remote cod…