~ similar to 2605.22058v1· 20 results
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…
Chengyan Ma, Jieke Shi, Ruidong Han, Ye Liu +3 more
The paper introduces TEERepair, a framework that automatically repairs severe security vulnerabilities caused by improper partitioning in Trusted Execution Environments (TEEs) by combining a domain-sp…
SAILOR automates the construction of symbolic execution harnesses by combining static analysis and LLM-based synthesis, significantly improving the scalability and effectiveness of vulnerability disco…
The paper tested the hypothesis that wrapping untrusted prompt inputs in mock tool calls would improve LLM robustness, but found that this technique generally fails and can even increase vulnerability…
COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper introduces Heimdall, an automated pipeline that uses LLMs and formal verification to safely and automatically migrate legacy, potentially buggy eBPF programs written in C to memory-safe Rust…
This study formally verified 3,500 AI-generated code artifacts and found that a majority (55.8%) contain exploitable security vulnerabilities, regardless of the LLM used.
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more
SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…
The paper proposes a general, compiler-integrated framework for secure content composition that minimizes the syntactic difference between secure and insecure coding practices.
The paper introduces dynamic, per-request separator generation for Polymorphic Prompt Assembling (PPA), significantly reducing the blast-radius vulnerability to prompt injection attacks by ensuring un…
The paper introduces the Mitigation-Aware Chain-of-Thought (MA-CoT) framework, which significantly enhances the security reliability of code generated by LLMs across multiple languages and models.
The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…
The paper introduces a systematic benchmark to test LLMs' ability to recover Indicators of Compromise (IoCs) from JavaScript code, finding that while LLMs handle simple obfuscation well, encryption-ba…
Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…
The paper introduces a novel toolkit to enhance RISC-V Trusted Execution Environments (TEEs) by adding modular extensions for secure enclave update, migration, state continuity, and trusted time, ther…
The paper introduces 'abliteration,' a weight editing technique that successfully bypasses the refusal mechanism of safety-aligned Code LLMs, enabling scalable synthesis of vulnerable code from safe i…
Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora +3 more
The paper introduces CodeScan, a novel black-box framework that detects data poisoning in code generation LLMs by analyzing structural similarities across multiple generations to identify recurring, v…