~ similar to 2605.10176v1· 20 results
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
Priyal Deep, Shane Emmons, Amy Fox, Kyle Bacon +3 more
The paper evaluates prompt injection defenses and finds that only external output filtering, implemented in application code, reliably prevents secret leaks from LLMs, demonstrating that model-based d…
Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang +2 more
The paper introduces PIDP-Attack, a novel compound adversarial attack that combines prompt injection with database poisoning to manipulate Retrieval-Augmented Generation (RAG) systems against arbitrar…
The paper evaluates two novel LLM-based systems, RADAGAS and RefleXQLi, for generating adversarial SQL injection payloads, finding that RADAGAS-GPT4o achieves a high bypass rate, particularly against…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
The paper introduces 'log-substrate prompt injection,' demonstrating that attacker-controlled log fields can be used to manipulate LLM-powered security analysis, with persona hijacking and context man…
The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero at…
Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more
The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…
The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense mechanism that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85%…
The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85% reduction…
The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models LLM prompts as structured segments to intercept prompt injection attacks with high accuracy and…
The security of LLM agents is critically dependent on their system prompt configuration, which creates a brittle attack surface that can be exploited by attackers inverting the prompt's core assumptio…
SecureMCP proposes a novel, policy-enforced framework that integrates Role-Based Access Control (RBAC) with an MCP server to provide multi-layer, fine-grained defense against malicious LLM-generated S…
The paper empirically analyzes the susceptibility of seven widely used AI-assisted development tools (MCP clients) to prompt injection via tool-poisoning, revealing significant disparities in their se…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper introduces PromptFuzz-SC, a novel semantic-character dual-space mutation framework, demonstrating that combining both semantic and character-level attacks significantly improves the robustne…
The paper tested the hypothesis that wrapping untrusted prompt inputs in mock tool calls would improve LLM robustness, but found that this technique generally fails and can even increase vulnerability…
The paper introduces Tree structured Injection for Payloads (TIP), a novel black-box attack framework that reliably generates stealthy injection payloads to seize control of LLM agents utilizing the M…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.