~ similar to 2604.12994v2· 20 results
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
VulKey introduces a novel LLM-based framework that uses a hierarchical abstraction of expert security knowledge to guide automatic vulnerability repair, achieving state-of-the-art performance on real-…
Xinran Zheng, Alfredo Pesoli, Marco Valleri, Suman Jana +1 more
Veritas is a semantically grounded framework that detects memory corruption vulnerabilities in stripped binaries by combining static analysis, LLM-based reasoning, and runtime validation, achieving hi…
Simiao Liu, Fang Liu, Li Zhang, Yang Liu +1 more
ContraFix is an agentic framework that improves automated vulnerability repair by using differential runtime evidence to pinpoint the root cause of bugs, achieving state-of-the-art performance on majo…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
SAILOR automates the construction of symbolic execution harnesses by combining static analysis and LLM-based synthesis, significantly improving the scalability and effectiveness of vulnerability disco…
This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…
The paper proposes a Residual Risk Scoring (RRS) framework that uses combined semantic and structural similarity analysis to estimate potential residual security risks in code after patching, finding…
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…
Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more
The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…
VeriCWEty proposes an embedding-based framework to detect and classify common software vulnerabilities (CWEs) in Verilog RTL code at both module and line levels, achieving high detection accuracy.
Ze Sheng, Zhicheng Chen, Qingxiao Xu, Kewen Zhu +1 more
FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function depe…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
NeuroLog is a novel, build-free neuro-symbolic pipeline that combines LLM-derived dataflow facts, Datalog, and SMT solving to systematically discover and synthesize exploitable memory safety vulnerabi…
Kevin Lira, Baldoino Fonseca, Davy Baía, Márcio Ribeiro +1 more
This study assesses the effectiveness and cost of four modern LLMs in detecting vulnerabilities that span multiple functions (interprocedural dependencies), finding that Gemini 3 Flash offers strong c…
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
This study formally verified 3,500 AI-generated code artifacts and found that a majority (55.8%) contain exploitable security vulnerabilities, regardless of the LLM used.