The paper introduces a contextual security framework for LLM agents, defining security properties and reformulating various attacks and defenses based on the context of execution.
Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security violation depending on whose instruction led to the action, what objective is being pursued, and whether the action serves that objective. However, existing definitions of security attacks against LLM agents often fail to capture this contextual nature. As a result, defenses face a fundamental utility-security tradeoff: applying defenses uniformly across all contexts can lead to significant utility loss, while applying defenses in insufficient or inappropriate contexts can result in security vulnerabilities. In this work, we present a framework that systematizes existing attacks and defenses from the perspective of contextual security. To this end, we propose four security properties that capture contextual security for LLM agents: task alignment (pursuing authorized objectives), action alignment (individual actions serving those objectives), source authorization (executing commands from authenticated sources), and data isolation (ensuring information flows respect privilege boundaries). We further introduce a set of oracle functions that enable verification of whether these security properties are violated as an agent executes a user task. Using this framework, we reformalize existing attacks, such as indirect prompt injection, direct prompt injection, jailbreak, task drift, and memory poisoning, as violations of one or more security properties, thereby providing precise and contextual definitions of these attacks. Similarly, we reformalize defenses as mechanisms that strengthen oracle functions or perform security property checks. Finally, we discuss several important future research directions enabled by our framework.
Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injecti…
The paper proposes a vision for system-level defenses against indirect prompt in…
Agent Audit: A Security Analysis System for LLM Agent Applications
Agent Audit is a novel security analysis system that comprehensively audits LLM…
ClawLess: A Security Model of AI Agents
ClawLess introduces a formally verified security framework that enforces fine-gr…
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
The paper introduces Document-Driven Implicit Payload Execution (DDIPE) to demon…
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Inject…
ClawGuard is a novel runtime security framework that deterministically enforces…
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
This paper provides the first comprehensive security analysis of the Agent Skill…
Evaluating Privilege Usage of Agents with Real-World Tools
The paper introduces GrantBox, a new security sandbox that evaluates how well LL…
WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard m…