~ similar to 2605.30454v1· 20 results
The vulnerability of LLM agents to prompt injection depends not on the specific channel (tool output vs. tool description) but on the interaction between the model and the surface itself.
This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…
The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…
AgentShield is a deception-based framework that detects successful indirect prompt injections in tool-using LLM agents across multiple languages by placing traps within the agent's tool interface.
The paper identifies a critical vulnerability, the Camouflage Detection Gap (CDG), where standard LLM injection detectors fail dramatically when malicious payloads mimic the target domain's language a…
The paper introduces AgentSecBench, a security evaluation framework that measures prompt injection, privacy leakage, and tool-use integrity in LLM agents by defining formal security games and testing…
The paper systematically maps LLM agent vulnerabilities by testing 10,000 prompt variations, finding that 'goal reframing' language is the primary trigger for exploitation, rather than broad adversari…
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
This paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a dynamic defense mechanism that traces and sanitizes untrusted control content i…
Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu +3 more
The paper introduces ClawTrojan, a benchmark for multi-step trojan attacks against LLM agents, and proposes DASGuard, a defense mechanism that detects and sanitizes backdoor content planted across mul…
The paper benchmarks current frontier computer-using agents against hand-crafted attacks, finding that while they are highly safe in browser tasks, this safety does not generalize to other domains lik…
The paper empirically analyzes the susceptibility of seven widely used AI-assisted development tools (MCP clients) to prompt injection via tool-poisoning, revealing significant disparities in their se…
The paper introduces the Safety Asymmetry Score (SAS) to measure how a model's vulnerability to adversarial content changes based on whether the malicious input arrives via the user message, tool meta…
The paper introduces the Safety Asymmetry Score (SAS) to measure how a model's susceptibility to adversarial attacks changes based on whether the malicious content arrives via the user message, tool m…
Priyal Deep, Shane Emmons, Amy Fox, Kyle Bacon +3 more
The paper evaluates prompt injection defenses and finds that only external output filtering, implemented in application code, reliably prevents secret leaks from LLMs, demonstrating that model-based d…
This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…
This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…
The security of LLM agents is critically dependent on their system prompt configuration, which creates a brittle attack surface that can be exploited by attackers inverting the prompt's core assumptio…
The paper identifies Mid-Session Tool Injection (MSTI) as a novel threat in the WebMCP protocol, demonstrating that attackers can manipulate the visible or perceived set of tools available to AI agent…
Lecheng Yan, Ruizhe Li, Xicheng Han, Wenxi Li +4 more
The paper introduces a new security benchmark and framework to defend LLM agents against 'cognitive poisoning,' where malicious tools build trust through benign feedback before executing a harmful fin…