Papers similar to 2605.20641v1

~ similar to 2605.20641v1· 20 results

cs.CRcs.CLRecentApr 14, 2026

Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors

Rui Yin, Tianxu Han, Naen Xu, Changjiang Li +7 more

The paper proposes a novel method to inject reliable, sustained backdoors into LLMs by compiling an activation steering vector into model weights, ensuring the backdoor only activates upon a specific…

View →

cs.CRcs.AIcs.CLRecentApr 23, 2026

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers

Jiali Wei, Ming Fan, Guoheng Sun, Xicheng Zhang +2 more

The paper introduces BadStyle, a novel backdoor attack framework that generates natural, stealthy poisoned samples using LLMs to compromise various LLMs with high success rates and robust activation.

View →

cs.CRcs.ARcs.LGRecentMay 11, 2026

LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

This review analyzes the dual impact of integrating Large Language Models (LLMs) into hardware design, detailing both their transformative potential in EDA and the critical security vulnerabilities th…

View →

cs.CRcs.LGRecentMay 28, 2026

Dissecting the Black Box: Circuit-Level Analysis of LLM Vulnerability Detection

Syafiq Al Atiiq, Chun Zhou, Christian Gehrmann

The paper analyzes LLM vulnerability detection using mechanistic interpretability, finding that models primarily rely on safety detectors rather than direct vulnerability signature recognition.

View →

cs.CRcs.CLRecentMay 14, 2026

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma +1 more

The paper introduces MetaBackdoor, a novel class of LLM backdoor attacks that exploits positional encoding (length-based triggers) rather than requiring modifications to the textual content.

View →

cs.CRRecentMay 5, 2026

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

Gabriel Hortea, Juan Tapiador

This paper quantifies the polymorphic capacity of a commercial LLM, demonstrating that it can cheaply generate large populations of structurally diverse, yet behaviorally equivalent, offensive code pa…

View →

cs.CRRecentMay 15, 2026

uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs

Debopriya Roy Dipta, Thore Tiemann, Eduard Marin, Thomas Eisenbarth +1 more

The paper introduces uGen, the first LLM-driven framework that uses a retrieval-augmented, multi-agent design to automatically generate functionally correct microarchitectural attack Proof-of-Concepts…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRcs.CLcs.SERecentMay 28, 2026

Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs

Alexander Sternfeld, Andrei Kucharavy, Ljiljana Dolamic

Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…

View →

cs.CRcs.LORecentApr 14, 2026

COBALT-TLA: A Neuro-Symbolic Verification Loop for Cross-Chain Bridge Vulnerability Discovery

Dominik Blain

COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.

View →

cs.PLcs.CRRecentMay 15, 2026

Compile-time Security Analysis and Optimization of Sensitive String Producers

Mike Samuel, Tom Palmer, Shaw Summa, Robert Grayson

The paper proposes a general, compiler-integrated framework for secure content composition that minimizes the syntactic difference between secure and insecure coding practices.

View →

cs.CRcs.AIRecentApr 4, 2026

SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization

Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more

The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…

View →

cs.CRcs.AIcs.LGEmpiricalRecentJul 28, 2026

Architectural Backdoors in Vision-Language Model Supply Chains via Representation Steering

Maria Rosaria Briglia, Igor Maljkovic, Antonio Emanuele Cinà, Luca Oneto +2 more

Researchers demonstrate how malicious providers can embed architectural backdoors into Vision--Language Model (VLM) supply chains through representation steering.

View →

cs.AIRecentMay 28, 2026

Opt-Verifier: Unleashing the Power of LLMs for Optimization Modeling via Dual-Side Verification

Haoyang Liu, Jie Wang, Boxuan Niu, Xiongwei Han +7 more

The paper introduces Opt-Verifier, a novel LLM-based framework that significantly improves the accuracy of automated optimization model generation by implementing dual-side verification from both stru…

View →

cs.CRcs.AIRecentApr 30, 2026

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Zi Li, Tian Zhou, Wenze Li, Jingyu Hua +2 more

This paper introduces a novel supply-chain attack that uses model code backdoors to actively steal sensitive secrets from local LLM fine-tuning datasets, bypassing current privacy defenses.

View →

cs.CRcs.AIRecentMar 31, 2026

Security in LLM-as-a-Judge: A Comprehensive SoK

Aiman Al Masoud, Antony Anju, Marco Arazzi, Mert Cihangiroglu +5 more

This paper provides the first comprehensive Systematization of Knowledge (SoK) on the security aspects of LLM-as-a-Judge (LaaJ) systems, identifying key vulnerabilities and proposing a taxonomy for fu…

View →

cs.CRcs.LGRecentApr 22, 2026

Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya +5 more

The paper conducts an interpretability-driven safety audit of eight state-of-the-art LLMs, demonstrating that while interpretability-based steering is a powerful auditing tool, model robustness varies…

View →

cs.CRRecentApr 1, 2026

When Safe Models Merge into Danger: Exploiting Latent Vulnerabilities in LLM Fusion

Jiaqing Li, Zhibo Zhang, Shide Zhou, Yuxi Li +2 more

The paper introduces TrojanMerge, a framework demonstrating that model merging can be exploited to systematically compromise the safety alignment of multiple individually safe LLMs.

View →