~ similar to 2603.22717v1· 20 results
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
The paper introduces the Mitigation-Aware Chain-of-Thought (MA-CoT) framework, which significantly enhances the security reliability of code generated by LLMs across multiple languages and models.
Li Huang, Zhongxin Liu, Yifan Wu, Tao Yin +5 more
DeepGuard introduces a novel multi-layer semantic aggregation framework to enhance secure code generation by collecting vulnerability cues from multiple upper layers of LLMs, significantly improving s…
This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…
The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.
This paper empirically evaluates the security of code generated by seven popular LLMs and finds that all evaluated models generate code containing critical or high-severity vulnerabilities.
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more
SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…
Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker +1 more
The paper analyzes GitHub security advisories for LLM-integrated open-source systems, finding that while most vulnerabilities map to existing code-level weaknesses, the architectural risks like Supply…
Minor, single-character perturbations to prompts can significantly degrade the security of code generated by LLMs, suggesting that prompt fragility is a major security concern beyond simple prompt inj…
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
This study empirically evaluates the cryptographic security of LLM-generated Rust code, finding that while general analysis tools are insufficient, a custom crypto-specific analyzer successfully ident…
The paper introduces an efficient, lightweight LLM framework for smart contract auditing that decouples the audit process into multiple components, achieving high accuracy while significantly reducing…
Youness Bouchari, Matteo Boffa, Marco Mellia, Idilio Drago +2 more
The paper re-evaluates LLM agents on CTFs, finding that while general-purpose agents like claude-code are strong baselines, specialized, modular architectures significantly improve performance and con…
The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…
Meifang Chen, Zhe Yang, Huang Nianchen, Yizhan Huang +3 more
This paper investigates how Byte-Pair Encoding (BPE) tokenization causes Code LLMs to disproportionately memorize certain types of secrets, a phenomenon termed 'gibberish bias'.