Papers similar to 2605.27823

~ similar to 2605.27823· 20 results

cs.CRcs.AIcs.CVRecentMay 27, 2026

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense mechanism that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85%…

View →

cs.CRRecentApr 4, 2026

AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models

Jackson Wang

AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…

View →

cs.CRRecentMay 20, 2026

Adversarial Reframing: A Framework for Targeted Generation in Language Models

Shahnewaz Karim Sakib, Swati Kar, Anindya Bijoy Das

The paper introduces THREAT, a novel reasoning-driven framework that efficiently discovers highly effective and targeted jailbreak prompts for LLMs, revealing previously unknown safety vulnerabilities…

View →

cs.AIRecentMay 31, 2026

Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts

Boxuan Wang, Zhuoyun Li, Xiaowei Huang, Yi Dong

The paper introduces an A*-inspired framework to generate highly effective and efficient adversarial prompts that cause LLMs to hallucinate commonsense errors while maintaining the original prompt's i…

View →

cs.CRcs.AIcs.CVRecentMay 15, 2026

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

Ye Sun, Xin Wang, Jiaming Zhang, Yifeng Gao +6 more

DarkLLM introduces a novel framework that uses a Large Language Model (LLM) to translate natural language instructions into flexible, latent adversarial attack vectors, demonstrating a systemic vulner…

View →

cs.CRcs.AIRecentMay 11, 2026

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, Pejman Najafi +2 more

The paper proposes a multi-layered security framework to detect and mitigate SQL injection attacks that occur when Large Language Models translate natural language prompts into database queries.

View →

cs.CRcs.AIRecentMar 17, 2026

Security Assessment and Mitigation Strategies for Large Language Models: A Comprehensive Defensive Framework

Taiwo Onitiju, Iman Vakilinia

The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…

View →

cs.CRcs.AIRecentMay 1, 2026

A Sentence Relation-Based Approach to Sanitizing Malicious Instructions

Soumil Datta, Melissa Umble, Daniel S. Brown, Guanhong Tao

The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero at…

View →

cs.CRcs.AIRecentMay 24, 2026

Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-Reflection

Lixing Lin, Juli You, Yue Li, Luyun Lin +3 more

Reflect-Guard enhances LLM safety classifiers by integrating logical self-reflection, significantly improving detection of sophisticated adversarial jailbreak prompts.

View →

cs.LGcs.AIcs.CRRecentMar 25, 2026

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye +2 more

The paper demonstrates that using advanced AI agents in an autoresearch loop can discover novel and highly effective adversarial attack algorithms, significantly advancing the state-of-the-art for jai…

View →

cs.CRcs.AIRecentMar 30, 2026

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur

This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…

View →

cs.CRcs.AIcs.CLRecentMay 5, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.

View →

cs.CRRecentMay 2, 2026

LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training

Yuyang Gong, Zihao Wang, Jiawei Liu, XiaoFeng Wang

LocalAlign proposes a generalizable prompt injection defense by generating near-target adversarial examples, which enforces a tighter robustness boundary around the correct model response.

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.AIRecentMar 26, 2026

The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Ron Litvak

The security of LLM agents is critically dependent on their system prompt configuration, which creates a brittle attack surface that can be exploited by attackers inverting the prompt's core assumptio…

View →

cs.CRRecentApr 14, 2026

DeepSeek Robustness Against Semantic-Character Dual-Space Mutated Prompt Injection

Junyu Ren, Xingjian Pan, Wensheng Gan, Philip S. Yu

The paper introduces PromptFuzz-SC, a novel semantic-character dual-space mutation framework, demonstrating that combining both semantic and character-level attacks significantly improves the robustne…

View →

cs.CRcs.SERecentMay 5, 2026

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more

The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…

View →

cs.CVcs.AIcs.CRRecentApr 10, 2026

Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection

Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

The paper introduces ImageProtector, a user-side method that embeds an imperceptible perturbation into images to prevent Multi-modal Large Language Models (MLLMs) from analyzing and extracting sensiti…

View →

cs.CRRecentMar 19, 2026

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah +1 more

The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models LLM prompts as structured segments to intercept prompt injection attacks with high accuracy and…

View →

cs.CRRecentApr 19, 2026

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi

The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…

View →