~ similar to 2605.31135v1· 20 results
Zirui Chen, Qi Zhan, Jiayuan Zhou, Xing Hu +2 more
This paper conducts a large-scale empirical study demonstrating that Java library exploits can accurately identify affected versions, achieving high recall and precision, and proposes strategies for e…
Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker +1 more
The paper analyzes GitHub security advisories for LLM-integrated open-source systems, finding that while most vulnerabilities map to existing code-level weaknesses, the architectural risks like Supply…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Mohammed Kharma, Ahmed Sabbah, Radi Jarrar, Samer Zain +2 more
The study found that providing developers with a layer-based security training package significantly reduces the number and severity of security vulnerabilities in LLM-assisted web application develop…
This study empirically evaluates the cryptographic security of LLM-generated Rust code, finding that while general analysis tools are insufficient, a custom crypto-specific analyzer successfully ident…
The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
This study conducts a large-scale longitudinal analysis of CodeQL, finding that while the tool is effective at detecting vulnerabilities, its detection capabilities are not guaranteed to be stable acr…
This paper empirically evaluates the security of code generated by seven popular LLMs and finds that all evaluated models generate code containing critical or high-severity vulnerabilities.
This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…
This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
The paper introduces the Mitigation-Aware Chain-of-Thought (MA-CoT) framework, which significantly enhances the security reliability of code generated by LLMs across multiple languages and models.
This paper analyzes a large corpus of research artifacts, finding that many contain insecure code patterns, and proposes SAFE, a novel framework for context-aware security assessment of these artifact…
The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
Oliver Jacobsen, Tobias Kirsch, Haya Schulmann, Niklas Vogel +1 more
This paper analyzes RPKI specifications, demonstrating that vague or conflicting requirements in dozens of RFCs cause systemic vulnerabilities in real-world implementations, leading to 61 undocumented…
This study formally verified 3,500 AI-generated code artifacts and found that a majority (55.8%) contain exploitable security vulnerabilities, regardless of the LLM used.