~ similar to 2604.05292v2· 20 results
Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
The paper presents Broken Quantum, a comprehensive formal security audit that identifies 547 security vulnerabilities across 45 open-source quantum computing simulators, revealing critical flaws in me…
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…
SAILOR automates the construction of symbolic execution harnesses by combining static analysis and LLM-based synthesis, significantly improving the scalability and effectiveness of vulnerability disco…
The paper introduces COBALT, a Z3 SMT-based formal verification engine, to proactively detect arithmetic vulnerabilities (CWE-190/191/195) in the critical infrastructure surrounding frontier AI models…
The paper proposes a general, compiler-integrated framework for secure content composition that minimizes the syntactic difference between secure and insecure coding practices.
COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.
Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more
SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…
The paper provides a formal proof that a single C program can contain a countably infinite number of distinct, independently assignable software vulnerabilities, suggesting the set of all software vul…
This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
Chengyan Ma, Jieke Shi, Ruidong Han, Ye Liu +2 more
The paper introduces SymTEE, an LLM-assisted symbolic execution framework that detects missing input validation vulnerabilities in TEE applications without needing complex, real TEE setups.
The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…
The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…
This study empirically evaluates the cryptographic security of LLM-generated Rust code, finding that while general analysis tools are insufficient, a custom crypto-specific analyzer successfully ident…
Yuwei Liu, Xinyi Wan, Yanhao Wang, Minghua Wang +2 more
KVerus is a retrieval-augmented system that significantly improves the scalability and resilience of formal verification for Rust code by managing complex cross-module dependencies and adapting to cod…
AutoSOUP is a system that automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs, leveraging a hybrid LLM-based architecture to overcome manual workflow limitat…
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…