Papers similar to 2605.18918v1

~ similar to 2605.18918v1· 20 results

cs.CRcs.AIcs.LGRecentMar 29, 2026

Evaluating Prompt Injection Defenses for Educational LLM Tutors: Security-Usability-Latency Trade-offs

The paper evaluates prompt-injection defenses for educational LLM tutors, demonstrating that optimal security requires balancing adversarial robustness, usability, and latency, and proposing a compreh…

View →

cs.CRRecentApr 4, 2026

AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models

Jackson Wang

AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…

View →

cs.CRcs.AIRecentMay 14, 2026

WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

Tri Cao, Yulin Chen, Hieu Cao, Yibo Li +7 more

The paper proposes WARD, a robust and efficient defense model that secures web agents against prompt injection attacks embedded in web content, achieving high recall and low false positives even again…

View →

cs.CRRecentApr 27, 2026

AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization

Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more

AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…

View →

cs.CRcs.AIcs.CLRecentJun 1, 2026

Defenses & Enablers For Skill Injection Attacks on Terminal Based Agents

Yoshinari Fujinuma, Varun Gangal, Traian Rebedea, Makesh Narasimhan Sreedhar +3 more

This paper introduces and evaluates guardian-based defenses, showing that an intermediary LLM agent can significantly reduce the success rate of skill injection attacks on terminal-based agents, even…

View →

cs.CRcs.AIcs.CLRecentJun 1, 2026

Defenses & Enablers For Skill Injection Attacks on Terminal Based Agents

Yoshinari Fujinuma, Varun Gangal, Traian Rebedea, Makesh Narasimhan Sreedhar +3 more

This paper proposes and evaluates guardian-based defenses, both dynamic and static, to mitigate skill injection attacks targeting LLM agents that rely on reusable procedural skills.

View →

cs.CRRecentApr 14, 2026

WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

Yulin Chen, Tri Cao, Haoran Li, Yue Liu +6 more

The paper introduces WebAgentGuard, a novel reasoning-driven, multimodal guard model that effectively detects prompt injection attacks in vulnerable web agents without compromising their functionality…

View →

cs.CRcs.SERecentMay 5, 2026

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more

The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…

View →

cs.CRRecentMay 2, 2026

LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training

Yuyang Gong, Zihao Wang, Jiawei Liu, XiaoFeng Wang

LocalAlign proposes a generalizable prompt injection defense by generating near-target adversarial examples, which enforces a tighter robustness boundary around the correct model response.

View →

cs.CRcs.CLRecentMay 31, 2026

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Yunhao Feng, Xiaohu Du, Xinhao Deng, Yifan Ding +12 more

BraveGuard is a self-evolving defense framework that significantly improves the safety monitoring of computer-use agents by generating guard model supervision from open-world threat discovery and real…

View →

cs.CRcs.CLRecentMay 31, 2026

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Yunhao Feng, Yifan Ding, Xiaohu Du, Ming Wen +12 more

BraveGuard is a self-evolving defense framework that improves the safety of computer-use agents by training guard models on open-world, multi-step threat trajectories rather than static benchmarks.

View →

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…

View →

cs.CRcs.AIcs.CLRecentMay 29, 2026

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more

The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…

View →

cs.CRcs.AIRecentMar 24, 2026

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense

Qianlong Lan, Anuj Kaul

The Cognitive Firewall is a hybrid edge-cloud defense architecture that significantly reduces the attack success rate of Indirect Prompt Injection against browser-based AI agents by combining local vi…

View →

cs.LGcs.AIcs.CRRecentMar 25, 2026

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye +2 more

The paper demonstrates that using advanced AI agents in an autoresearch loop can discover novel and highly effective adversarial attack algorithms, significantly advancing the state-of-the-art for jai…

View →

cs.CRcs.AIRecentApr 28, 2026

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen +3 more

SnapGuard proposes a lightweight, multimodal method to detect prompt injection attacks in screenshot-based web agents by analyzing visual stability and contrast-polarity textual signals, achieving hig…

View →

cs.CRcs.AIcs.CLRecentJun 3, 2026

Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

Nicholas Saban

The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…

View →

cs.CRcs.AIcs.LGRecentMay 29, 2026

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

Mohammadreza Rashidi

This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…

View →

cs.CRcs.AIcs.LGRecentMay 29, 2026

Depth-Dependent Indirect Prompt Injection in Tool-Calling ReAct Agents: Injection Depth, Payload Framing, and Turn-Budget Sensitivity

Mohammadreza Rashidi

The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…

View →

cs.CRcs.CLRecentMay 10, 2026

AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents

Yassin H. Rassul, Tarik A. Rashid

AgentShield is a deception-based framework that detects successful indirect prompt injections in tool-using LLM agents across multiple languages by placing traps within the agent's tool interface.

View →