The paper introduces ASPI, a benchmark showing that requiring LLM agents to seek clarification significantly amplifies their vulnerability to prompt injection attacks.
Clarification-seeking behavior is widely regarded as a desirable property of LLM agents, enabling them to resolve ambiguity before acting on underspecified tasks. However, the security implications of this interaction pattern remain unexplored. We investigate whether the transition from standard execution to a clarification-seeking state increases an agent's susceptibility to prompt injection attacks. We introduce ASPI (Ambiguous-State Prompt Injection), a benchmark of 728 task-attack scenarios that isolates clarification as a distinct agent state and measures how this state transition affects vulnerability under controlled conditions. Each benchmark instance is evaluated under matched execution and clarification settings: in the execution setting, the agent acts on a fully specified instruction and encounters adversarial content only through tool-returned data; in the clarification setting, the agent must first request and incorporate additional user input before acting. We evaluate ten frontier LLMs and find that clarification-seeking consistently and substantially amplifies vulnerability. For instance, attack success rises from 1.8% to 34.0% for o3 and from 2.2% to 35.7% for Gemini-3-Flash. A decomposition analysis reveals that this gap reflects both a state-dependent shift in how models process incoming content and a channel-specific effect arising from the agent-solicited clarification interface. These findings demonstrate that standard execution-time security evaluation systematically underestimates the attack surface of interactive agents, and that robustness under fully specified tasks does not translate to robustness under ambiguity. For reproducibility, our data and source code are available at https://github.com/scaleapi/aspi.
Are AI-assisted Development Tools Immune to Prompt Injection?
The paper empirically analyzes the susceptibility of seven widely used AI-assist…
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Inject…
ClawGuard is a novel runtime security framework that deterministically enforces…
WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard m…
PIArena: A Platform for Prompt Injection Evaluation
The paper introduces PIArena, a unified and extensible platform designed to addr…
Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injecti…
The paper proposes a vision for system-level defenses against indirect prompt in…
When Convenience Becomes Risk: A Semantic View of Under-Specification in Host-Acting Agents
The paper identifies that the convenience of host-acting agents leads to semanti…
AgentWatcher: A Rule-based Prompt Injection Monitor
AgentWatcher is a novel, rule-based monitor designed to detect prompt injection…
The Autonomy Tax: Defense Training Breaks LLM Agents
Defense training for LLM agents, intended to improve safety, systematically degr…