Papers similar to 2603.17123v1

~ similar to 2603.17123v1· 20 results

cs.CRcs.AIRecentMay 11, 2026

Threat Modelling using Domain-Adapted Language Models: Empirical Evaluation and Insights

Saba Pourhanifeh, AbdulAziz AbdulGhaffar, Ashraf Matrawy

The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.AIRecentMay 28, 2026

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Galip Tolga Erdem

This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…

View →

cs.CRcs.AIRecentMay 28, 2026

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Galip Tolga Erdem

This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…

View →

cs.CRcs.SERecentMar 24, 2026

Does Teaming-Up LLMs Improve Secure Code Generation? A Comprehensive Evaluation with Multi-LLMSecCodeEval

Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more

The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…

View →

cs.CRcs.AIcs.LGRecentMay 8, 2026

Defense effectiveness across architectural layers: a mechanistic evaluation of persistent memory attacks on stateful LLM agents

Jun Wen Leong

The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…

View →

cs.CRRecentApr 19, 2026

GuardPhish: Securing Open-Source LLMs from Phishing Abuse

Rina Mishra, Gaurav Varshney, Doddipatla Sesha Sahithi

The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…

View →

cs.CRcs.AIcs.LGRecentMay 24, 2026

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

Wenjuan Li, Yitao Liu, Runze Chen, Rajkumar Buyya

This paper provides a systematic, lifecycle-based framework for analyzing security threats and defenses across the entire fine-tuning process of LLMs, revealing that attack effectiveness is highly mod…

View →

cs.CRcs.AIRecentMay 17, 2026

When Efficiency Backfires: Cascading LLMs Trigger Cascade Failure under Adversarial Attack

Zehan Sun, Dingfan Chen, Songze Li

This paper demonstrates that LLM cascade systems, designed for efficiency, are vulnerable to targeted adversarial attacks that simultaneously degrade both performance and cost-efficiency.

View →

cs.CRcs.AIRecentMar 31, 2026

Security in LLM-as-a-Judge: A Comprehensive SoK

Aiman Al Masoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu +5 more

This paper provides the first comprehensive Systematization of Knowledge (SoK) on the security aspects of LLM-as-a-Judge (LaaJ) systems, identifying key vulnerabilities and proposing a taxonomy for fu…

View →

cs.CRcs.CVRecentMar 18, 2026

Toward Reliable, Safe, and Secure LLMs for Scientific Applications

Saket Sanjeev Chaturvedi, Joshua Bergerson, Tanwi Mallick

This paper addresses the critical need for trustworthy LLMs in science by proposing a comprehensive, multi-layered defense framework and methodology to evaluate unique scientific vulnerabilities.

View →

cs.CRcs.AIcs.CYRecentMar 24, 2026

Robust Safety Monitoring of Language Models via Activation Watermarking

Toluwani Aremu, Daniil Ognev, Samuele Poppi, Nils Lukas

This paper addresses the vulnerability of existing LLM safety monitors to adaptive attackers and proposes activation watermarking, a technique that significantly improves detection robustness against…

View →

cs.CRcs.AIRecentApr 11, 2026

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

Souradip Nath, Chih-Yi Huang, Aditi Ganapathi, Kashyap Thimmaraju +2 more

Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…

View →

cs.CRcs.AIRecentMay 4, 2026

On the Privacy of LLMs: An Ablation Study

Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi, Gabriel Marquez +3 more

This paper conducts a structured ablation study using a unified threat model to evaluate how various system factors (like model architecture and retrieval configuration) influence different types of p…

View →

cs.CRcs.CLRecentMay 14, 2026

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets

The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…

View →

cs.CRcs.LGRecentMay 28, 2026

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann

The paper analyzes LLM vulnerability detection using mechanistic interpretability, finding that models primarily rely on safety detectors rather than direct vulnerability signature recognition.

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

Mohammed Kharma, Ahmed Sabbah, Mohammad Alkhanafseh, Mohammad Hammoudeh +1 more

The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…

View →

cs.CRcs.AIRecentMar 30, 2026

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan Ö. Arık, Hardeo K. Thakur

This survey provides a comprehensive taxonomy and vulnerability-centric analysis of adversarial attacks targeting Multimodal Large Language Models (MLLMs), offering an explanatory framework for enhanc…

View →

cs.CRcs.AIcs.CLRecentMar 23, 2026

SecureBreak -- A dataset towards safe and secure models

Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera

The paper introduces SecureBreak, a manually annotated, safety-oriented dataset designed to help detect harmful outputs from large language models (LLMs) that bypass existing security alignments.

View →

cs.CRcs.CLRecentApr 24, 2026

Training a General Purpose Automated Red Teaming Model

Aishwarya Padmakumar, Leon Derczynski, Traian Rebedea, Christopher Parisien

The paper proposes a general-purpose pipeline to train automated red teaming models capable of generating attacks for arbitrary adversarial goals, overcoming the limitations of current methods that ar…

View →