The paper introduces LoopTrap, an automated red-teaming framework that demonstrates how malicious prompts can poison the termination judgment of LLM agents, causing unbounded computation.
Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to determine when a task is complete. In this work, we show that while this self-directed loop facilitates autonomy, it also introduces a critical risk: by injecting malicious prompts into the agent's context, an adversary can distort the agent's termination judgment, making it believe the task remains incomplete and leading to unbounded computation.To understand this threat, we define and systematically characterize it as Termination Poisoning and design 10 representative attack strategies. Through a empirical study spanning 8 LLM agents and 60 tasks, we demonstrate that different LLM agents exhibit distinct behavioral signatures that determine which strategies succeed. These transferable patterns can serve as principled guidance for crafting effective attacks against previously unseen agents and tasks, enabling scalable red-teaming beyond manually designed templates. Building on these insights, we introduce LoopTrap, an automated red-teaming framework that synthesizes target-specific malicious prompts by exploiting agent behavioral tendencies. LoopTrap first constructs a behavioral profile of the target agent along four vulnerability dimensions via lightweight probing. It then performs adaptive trap synthesis, routing to the most effective strategy and selecting optimal injections via a self-scoring mechanism. Finally, successful traps are abstracted into a reusable skill library, while failed attempts are refined through self-reflection, ensuring continuous improvement. Extensive evaluation shows that LoopTrap achieves an average of 3.57$\times$ step amplification across 8 mainstream agents, with a peak of 25$\times$.
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
The paper introduces T-MAP, a trajectory-aware evolutionary search method, to di…
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
The paper introduces Document-Driven Implicit Payload Execution (DDIPE) to demon…
Automated Membership Inference Attacks: Discovering MIA Signal Computations using LLM Agents
The paper introduces AutoMIA, a novel framework that uses LLM agents to automate…
The Autonomy Tax: Defense Training Breaks LLM Agents
Defense training for LLM agents, intended to improve safety, systematically degr…
Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Age…
The paper proposes a comprehensive framework for LLM-based agent unlearning, ena…
Evaluating Privilege Usage of Agents with Real-World Tools
The paper introduces GrantBox, a new security sandbox that evaluates how well LL…
Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents
The paper introduces eTAMP, a novel attack that poisons LLM web agents' memory u…
Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance
This paper identifies and characterizes 'guidance injection,' a stealthy attack…