Papers similar to 2604.11790v2

~ similar to 2604.11790v2· 20 results

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…

View →

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…

View →

cs.CRRecentApr 27, 2026

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more

AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…

View →

cs.CRcs.AIRecentApr 7, 2026

ClawLess: A Security Model of AI Agents

Hongyi Lu, Nian Liu, Shuai Wang, Fengwei Zhang

ClawLess introduces a formally verified security framework that enforces fine-grained policies on autonomous AI agents, mitigating risks associated with their ability to run code and retrieve informat…

View →

cs.CRRecentApr 11, 2026

PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

Guangyu Gong, Zizhuang Deng

PlanGuard is a training-free defense framework that uses an isolated Planner and hierarchical verification to defend LLM agents against Indirect Prompt Injection by verifying the consistency of planne…

View →

cs.CRcs.SERecentMay 5, 2026

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more

The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…

View →

cs.CRRecentApr 8, 2026

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye, Zhechang Zhang, Jinyuan Jia, Hong Hu

The paper introduces TRUSTDESC, a novel framework that prevents tool poisoning attacks in LLM applications by automatically generating highly accurate and trusted tool descriptions directly from the t…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRRecentMay 25, 2026

AgentSecBench: Measuring Prompt Injection, Privacy Leakage, and Tool-Use Integrity in LLM Agents

Faruk Alpay, Taylan Alpay

The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…

View →

cs.CRcs.AIRecentMar 29, 2026

A Security Analysis of the OpenClaw AI Agent Framework

Surada Suwansathit, Yuxuan Zhang, Guofei Gu

This paper analyzes 470 security advisories in the OpenClaw AI agent framework, demonstrating that the system's structural weakness lies in per-layer trust enforcement, enabling cross-layer remote cod…

View →

cs.CRcs.AIcs.CLRecentJun 1, 2026

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

Hiskias Dingeto, William Leeney

The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using third-party integrations, and releases AGENTREDGUAR…

View →

cs.CRcs.AIcs.CLRecentJun 1, 2026

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

Hiskias Dingeto, Will Leeney

The paper introduces AGENTREDBENCH, a dynamic redteaming benchmark that significantly measures indirect prompt injection threats in LLM agents using SaaS integrations, and releases AGENTREDGUARD, a su…

View →

cs.CRRecentMay 20, 2026

VIPER-MCP: Detecting and Exploiting Taint-Style Vulnerabilities in Model Context Protocol Servers

Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more

VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…

View →

cs.CRcs.AIRecentMay 18, 2026

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control

Rohith Uppala

The paper proposes an architectural proxy (MCP) to enforce robust, reliable tool access control for LLM agents, demonstrating that this structural enforcement is necessary because prompt-based restric…

View →

cs.CRRecentMay 27, 2026

AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent

Jiaqi Luo, Songyang Peng, Jiarun Dai, Zhile Chen +5 more

AgentGuard is an attribute-based access control framework designed to mitigate severe security risks, such as privacy leakage and system compromise, in tool-using LLM-based agents.

View →

cs.CRRecentApr 1, 2026

AgentWatcher: A Rule-based Prompt Injection Monitor

Yanting Wang, Wei Zou, Runpeng Geng, Jinyuan Jia

AgentWatcher is a novel, rule-based monitor designed to detect prompt injection attacks in LLM agents by focusing detection on causally influential context segments, thereby improving scalability and…

View →

cs.CRRecentMar 19, 2026

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah +1 more

The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models LLM prompts as structured segments to intercept prompt injection attacks with high accuracy and…

View →

cs.CRRecentApr 23, 2026

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more

This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…

View →

cs.CRcs.AIRecentMar 31, 2026

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more

The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.

View →

cs.CRRecentApr 14, 2026

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

Yulin Chen, Tri Cao, Haoran Li, Yue Liu +6 more

The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard model that effectively detects prompt injection attacks in vulnerable web agents without compromising their functionality…

View →