~ similar to 2604.17313v1· 20 results
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
The security of LLM agents is critically dependent on their system prompt configuration, which creates a brittle attack surface that can be exploited by attackers inverting the prompt's core assumptio…
AttackEval systematically evaluates the effectiveness of 250 prompt injection prompts across ten attack categories, finding that composite and obfuscation attacks are highly effective against current…
This paper demonstrates that by applying systematic prompting and retrieval techniques, local open-weight LLMs can significantly enhance their capabilities to autonomously perform Linux privilege esca…
Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi, Gabriel Marquez +3 more
This paper conducts a structured ablation study using a unified threat model to evaluate how various system factors (like model architecture and retrieval configuration) influence different types of p…
Paulo Ricardo Ferreira Neves, Edson Rodrigues da Cruz Filho, Paulo Henrique Eleuterio Falsetti, João Vitor Pavan +6 more
GuardNet is a lightweight, ensemble-based guardrail system using shallow neural networks that provides robust and efficient detection of Prompt Injection and Jailbreak attacks on LLMs, suitable for pr…
This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper evaluates prompt-injection defenses for educational LLM tutors, demonstrating that optimal security requires balancing adversarial robustness, usability, and latency, and proposing a compreh…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
Lixing Lin, Juli You, Yue Li, Luyun Lin +3 more
Reflect-Guard enhances LLM safety classifiers by integrating logical self-reflection, significantly improving detection of sophisticated adversarial jailbreak prompts.
Priyal Deep, Shane Emmons, Amy Fox, Kyle Bacon +3 more
The paper evaluates prompt injection defenses and finds that only external output filtering, implemented in application code, reliably prevents secret leaks from LLMs, demonstrating that model-based d…
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu +3 more
This paper presents the first systematic study of black-box skill stealing attacks against proprietary LLM agents, demonstrating that structured agent skills can be easily extracted, posing a signific…
The paper introduces SecureBreak, a manually annotated, safety-oriented dataset designed to help detect harmful outputs from large language models (LLMs) that bypass existing security alignments.
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
Yunhao Feng, Xiaohu Du, Xinhao Deng, Yifan Ding +12 more
BraveGuard is a self-evolving defense framework that significantly improves the safety monitoring of computer-use agents by generating guard model supervision from open-world threat discovery and real…
Yunhao Feng, Yifan Ding, Xiaohu Du, Ming Wen +12 more
BraveGuard is a self-evolving defense framework that improves the safety of computer-use agents by training guard models on open-world, multi-step threat trajectories rather than static benchmarks.
The paper identifies a critical vulnerability, the Camouflage Detection Gap (CDG), where standard LLM injection detectors fail dramatically when malicious payloads mimic the target domain's language a…
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
This paper proposes a structured pipeline using LLMs to generate and evaluate obfuscated XSS payloads, demonstrating that while LLMs can generate samples, they currently struggle to ensure payloads ma…