~ similar to 2604.01194v1· 20 results
Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more
The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…
ClawGuard is a novel runtime security framework that deterministically enforces user-confirmed rules at tool-call boundaries to protect LLM agents from indirect prompt injection.
Yulin Chen, Tri Cao, Haoran Li, Yue Liu +6 more
The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard model that effectively detects prompt injection attacks in vulnerable web agents without compromising their functionality…
PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…
The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models LLM prompts as structured segments to intercept prompt injection attacks with high accuracy and…
Chia-Pei, Chen, Kentaroh Toyoda, Anita Lai +1 more
The paper introduces IPI-proxy, an open-source intercepting proxy toolkit designed to red-team web-browsing AI agents by injecting adversarial payloads into live HTTP responses from whitelisted domain…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using third-party integrations, and releases AGENTREDGUAR…
The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using SaaS integrations, and releases AGENTREDGUARD, a su…
Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more
The paper introduces a novel stateful online monitoring system that detects distributed multi-agent cyberattacks by aggregating weak suspiciousness signals across many user accounts, overcoming the bl…
Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong +6 more
The paper introduces a novel stateful online monitoring system that detects distributed multi-agent cyberattacks by aggregating weak suspiciousness signals across many user accounts, overcoming the bl…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
The paper argues that prompt injection is a fundamental vulnerability in AI agents, proposing that Contextual Integrity (CI) offers a principled framework to understand and mitigate context-sensitive…
The paper introduces NeuroTaint, a novel taint tracking framework that adapts information flow analysis for LLM agents by modeling taint propagation as semantic transformation and causal influence, si…
The paper introduces Agent-ToM, a Theory-of-Mind (ToM) based framework that learns to monitor autonomous LLM agents by explicitly reasoning about their hidden beliefs and intentions to detect covert m…
AgentTrust is a novel runtime safety layer that intercepts and evaluates AI agent tool calls before execution, achieving high accuracy in detecting unsafe actions across complex and obfuscated scenari…