~ similar to 2604.03598v1· 20 results
The paper evaluates prompt injection detection in a deployment-aware, multi-regime framework, finding that detection performance is highly dependent on the operational setting and that no single detec…
Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more
The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…
The paper introduces 'log-substrate prompt injection,' demonstrating that attacker-controlled log fields can be used to manipulate LLM-powered security analysis, with persona hijacking and context man…
Runpeng Geng, Chenlong Yin, Yanting Wang, Ying Chen +1 more
The paper introduces PIArena, a unified and extensible platform designed to address the lack of standardized evaluation for prompt injection, revealing critical limitations in current state-of-the-art…
Priyal Deep, Shane Emmons, Amy Fox, Kyle Bacon +3 more
The paper evaluates prompt injection defenses and finds that only external output filtering, implemented in application code, reliably prevents secret leaks from LLMs, demonstrating that model-based d…
The paper introduces Transient Turn Injection (TTI), a novel multi-turn attack technique that exploits stateless moderation in LLMs by distributing adversarial intent across isolated interactions, rev…
The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense mechanism that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85%…
The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85% reduction…
The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero at…
This paper provides a large-scale empirical analysis of indirect prompt injections found in webpages, revealing that prompt-based interference is a widespread, persistent, and growing threat targeting…
Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa +4 more
The paper proposes a vision for system-level defenses against indirect prompt injection attacks targeting AI agents, emphasizing structured control and human oversight.
The paper introduces Tree structured Injection for Payloads (TIP), a novel black-box attack framework that reliably generates stealthy injection payloads to seize control of LLM agents utilizing the M…
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
The paper introduces PromptFuzz-SC, a novel semantic-character dual-space mutation framework, demonstrating that combining both semantic and character-level attacks significantly improves the robustne…
The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…
LocalAlign proposes a generalizable prompt injection defense by generating near-target adversarial examples, which enforces a tighter robustness boundary around the correct model response.
This paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically analyzing how the injection depth and payload framing affect attack success rates, finding that inje…
The paper investigates indirect prompt injection vulnerabilities in ReAct agents by systematically varying the injection depth, payload framing, and turn budget, finding that injection depth is the do…