ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2604.12548v1· 20 results

cs.CRRecentApr 4, 2026

AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models

Jackson Wang

AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…

View →
cs.CRcs.CLcs.SERecentMay 28, 2026

Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs

Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic

Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…

View →
cs.AIRecentMay 31, 2026

Dive into Ambiguity: A*-Inspired Multi-Agents Commonsense Obfuscation Attack on LLM Prompts

Boxuan Wang, Zhuoyun Li, Xiaowei Huang, Yi Dong

The paper introduces an A*-inspired framework to generate highly effective and efficient adversarial prompts that cause LLMs to hallucinate commonsense errors while maintaining the original prompt's i…

View →
cs.CRRecentMay 18, 2026

On the Geometric Limits of Transformer Defenses against Obfuscation Attacks: Latent Embedding Collapse & Performance Robustness Gap

Becky Mashaido, Tapadhir Das

The paper demonstrates that high detection performance against obfuscated prompts does not guarantee representational robustness, identifying a phenomenon called latent embedding collapse.

View →
cs.CRcs.AIcs.CLRecentApr 23, 2026

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang +2 more

The paper introduces BadStyle, a novel backdoor attack framework that generates natural, stealthy poisoned samples using LLMs to compromise various LLMs with high success rates and robust activation.

View →
cs.CLcs.CRRecentMay 26, 2026

Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals

Akindoyin Akinrele, Shreyank N Gowda

The paper evaluates prompt injection detection in a deployment-aware, multi-regime framework, finding that detection performance is highly dependent on the operational setting and that no single detec…

View →
cs.CRcs.AIRecentMay 11, 2026

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, Pejman Najafi +2 more

The paper proposes a multi-layered security framework to detect and mitigate SQL injection attacks that occur when Large Language Models translate natural language prompts into database queries.

View →
cs.CRcs.AIcs.LGRecentMay 18, 2026

Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks

John T. Halloran, Noopur S. Bhatt

The paper proposes Open-Book Benign Rewriting (OBBR), a novel defense mechanism that uses LLM rewriting with benign samples to neutralize data poisoning attacks against LLMs, significantly improving s…

View →
cs.CRcs.AIRecentMay 1, 2026

A Sentence Relation-Based Approach to Sanitizing Malicious Instructions

Soumil Datta, Melissa Umble, Daniel S. Brown, Guanhong Tao

The paper introduces SONAR, a prompt sanitization framework that uses natural language inference metrics to identify and remove malicious instructions injected into LLM prompts, achieving near-zero at…

View →
cs.CRcs.AIRecentMar 26, 2026

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang +2 more

The paper introduces PIDP-Attack, a novel compound adversarial attack that combines prompt injection with database poisoning to manipulate Retrieval-Augmented Generation (RAG) systems against arbitrar…

View →
cs.CRcs.AIcs.CVRecentMay 27, 2026

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Xiang Fang, Wanlong Fang

The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense mechanism that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85%…

View →
cs.CRcs.AIcs.CVRecentMay 27, 2026

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

Xiang Fang, Wanlong Fang

The paper proposes the Adversarial Prompt Disentanglement (APD) framework, a novel defense that proactively identifies and neutralizes malicious components in LLM prompts, achieving over 85% reduction…

View →
cs.CRcs.AIcs.CLRecentMay 5, 2026

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis

Haoyu Zhang, Mohammad Zandsalimy, Shanu Sushmita

The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.

View →
cs.CRcs.AIRecentApr 23, 2026

Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models

Naheed Rayhan, Sohely Jahan

The paper introduces Transient Turn Injection (TTI), a novel multi-turn attack technique that exploits stateless moderation in LLMs by distributing adversarial intent across isolated interactions, rev…

View →
cs.LGcs.AIcs.CRRecentMay 27, 2026

SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection

Shuhao Chen, Weisen Jiang, Yeqi Gong, Shengda Luo +4 more

SPARD is a defense framework that uses Safety-Projected Alternating optimization and Relevance-Diversity data selection to protect large language models from harmful fine-tuning attacks, achieving sup…

View →
cs.LGcs.AIcs.CRRecentMay 27, 2026

SPARD: Defending Harmful Fine-Tuning Attack via Safety Projection with Relevance-Diversity Data Selection

Shuhao Chen, Weisen Jiang, Yeqi Gong, Shengda Luo +4 more

SPARD is a defense framework that uses Safety-Projected Alternating optimization and Relevance-Diversity data selection to mitigate harmful fine-tuning attacks that undermine LLM safety.

View →
cs.CRRecentMar 19, 2026

Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Md Takrim Ul Alam, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah +1 more

The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models LLM prompts as structured segments to intercept prompt injection attacks with high accuracy and…

View →
cs.CRcs.CLRecentMay 14, 2026

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma +1 more

The paper introduces MetaBackdoor, a novel class of LLM backdoor attacks that exploits positional encoding (length-based triggers) rather than requiring modifications to the textual content.

View →
cs.CRcs.IRRecentMay 27, 2026

Can It Reach the Generator? Investigating the Survival of Prompt-Injection Attacks in Realistic RAG Settings

Yu Yin, Shuai Wang, Bevan Koopman, Guido Zuccon

This paper re-evaluates prompt-injection attacks in realistic RAG settings, finding that most prior attack methods fail to reach the generator, and that current attacks are easily detectable.

View →
cs.CRRecentMay 8, 2026

Cross-Modal Backdoors in Multimodal Large Language Models

Runhe Wang, Li Bai, Haibo Hu, Songze Li

The paper proposes a novel cross-modal backdoor attack that exploits the vulnerability of lightweight connectors in multimodal LLMs, demonstrating high attack success rates across different modalities…

View →