~ similar to 2604.27714v1· 20 results
The paper introduces 'abliteration,' a weight editing technique that successfully bypasses the refusal mechanism of safety-aligned Code LLMs, enabling scalable synthesis of vulnerable code from safe i…
The paper analyzes LLM vulnerability detection using mechanistic interpretability, finding that models primarily rely on safety detectors rather than direct vulnerability signature recognition.
This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…
Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…
This paper proposes a structured pipeline using LLMs to generate and evaluate obfuscated XSS payloads, demonstrating that while LLMs can generate samples, they currently struggle to ensure payloads ma…
Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora +3 more
The paper introduces CodeScan, a novel black-box framework that detects data poisoning in code generation LLMs by analyzing structural similarities across multiple generations to identify recurring, v…
This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.
The paper compares two sparse autoencoder architectures, finding that Differential SAEs (Diff-SAE) significantly outperform Crosscoders in isolating backdoor-related features in language models.
The paper introduces the Mitigation-Aware Chain-of-Thought (MA-CoT) framework, which significantly enhances the security reliability of code generated by LLMs across multiple languages and models.
The paper demonstrates that encoding harmful prompts as genuine mathematical problems, rather than just using mathematical formatting, effectively bypasses the safety filters of large language models.
The paper proposes a Residual Risk Scoring (RRS) framework that uses combined semantic and structural similarity analysis to estimate potential residual security risks in code after patching, finding…
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
VulStyle introduces a multi-modal model that jointly encodes source code, non-terminal AST structure, and code stylometry features to achieve state-of-the-art performance in software vulnerability det…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
The paper tested the hypothesis that wrapping untrusted prompt inputs in mock tool calls would improve LLM robustness, but found that this technique generally fails and can even increase vulnerability…
The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
The paper introduces the Safety Asymmetry Score (SAS) to measure how a model's vulnerability to adversarial content changes based on whether the malicious input arrives via the user message, tool meta…