Papers similar to 2604.12994v2

~ similar to 2604.12994v2· 20 results

cs.SEcs.CRRecentMay 27, 2026

Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities

Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more

The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…

View →

cs.CRcs.SERecentMay 3, 2026

VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

Jia Li, Zhuangbin Chen, Yuxin Su, Michael R. Lyu

VulKey introduces a novel LLM-based framework that uses a hierarchical abstraction of expert security knowledge to guide automatic vulnerability repair, achieving state-of-the-art performance on real-…

View →

cs.SEcs.CRRecentMay 14, 2026

Veritas: A Semantically Grounded Agentic Framework for Memory Corruption Vulnerability Detection in Binaries

Xinran Zheng, Alfredo Pesoli, Marco Valleri, Suman Jana +1 more

Veritas is a semantically grounded framework that detects memory corruption vulnerabilities in stripped binaries by combining static analysis, LLM-based reasoning, and runtime validation, achieving hi…

View →

cs.SEcs.AIcs.CLRecentMay 17, 2026

ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse

Simiao Liu, Fang Liu, Li Zhang, Yang Liu +1 more

ContraFix is an agentic framework that improves automated vulnerability repair by using differential runtime evidence to pinpoint the root cause of bugs, achieving state-of-the-art performance on majo…

View →

cs.CRRecentMar 25, 2026

Bridging Code Property Graphs and Language Models for Program Analysis

Ahmed Lekssays

The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…

View →

cs.CRcs.SERecentApr 7, 2026

Guiding Symbolic Execution with Static Analysis and LLMs for Vulnerability Discovery

Md Shafiuzzaman, Achintya Desai, Wenbo Guo, Tevfik Bultan

SAILOR automates the construction of symbolic execution harnesses by combining static analysis and LLM-based synthesis, significantly improving the scalability and effectiveness of vulnerability disco…

View →

cs.CRcs.LGRecentApr 17, 2026

Surgical Repair of Insecure Code Generation in LLMs

Gustavo Sandoval, Brendan Dolan-Gavitt, Siddharth Garg

This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…

View →

cs.SEcs.CRRecentApr 22, 2026

Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach

Mohammad Farhad, Shuvalaxmi Dass

The paper proposes a Residual Risk Scoring (RRS) framework that uses combined semantic and structural similarity analysis to estimate potential residual security risks in code after patching, finding…

View →

cs.CRcs.AIcs.MARecentApr 20, 2026

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more

The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…

View →

cs.SEcs.AIcs.CRRecentApr 12, 2026

Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis

Jugal Gajjar

The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

Mohammed Kharma, Ahmed Sabbah, Mohammad Alkhanafseh, Mohammad Hammoudeh +1 more

The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…

View →

cs.CRcs.PLcs.SERecentApr 28, 2026

Symbolic Execution Meets Multi-LLM Orchestration: Detecting Memory Vulnerabilities in Incomplete Rust CVE Snippets

Zeyad Abdelrazek, Young Lee

The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…

View →

cs.SEcs.CRcs.LGRecentMay 13, 2026

Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more

The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…

View →

cs.ARcs.AIcs.CRRecentApr 15, 2026

VeriCWEty: Embedding enabled Line-Level CWE Detection in Verilog

Prithwish Basu Roy, Zeng Wang, Anatolii Chuvashlov, Weihua Xiao +3 more

VeriCWEty proposes an embedding-based framework to detect and classify common software vulnerabilities (CWEs) in Verilog RTL code at both module and line levels, achieving high detection accuracy.

View →

cs.CRcs.SERecentMay 20, 2026

FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction

Ze Sheng, Zhicheng Chen, Qingxiao Xu, Kewen Zhu +1 more

FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function depe…

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.CRRecentMay 30, 2026

NeuroLog: Reasoning You Can Audit -- Neuro-Symbolic Vulnerability Discovery via LLM Facts, Datalog, and SMT

Sanjay Rawat

NeuroLog is a novel, build-free neuro-symbolic pipeline that combines LLM-derived dataflow facts, Datalog, and SMT solving to systematically discover and synthesize exploitable memory safety vulnerabi…

View →

cs.SEcs.CRRecentApr 9, 2026

Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs

Kevin Lira, Baldoino Fonseca, Davy Baía, Márcio Ribeiro +1 more

This study assesses the effectiveness and cost of four modern LLMs in detecting vulnerabilities that span multiple functions (interprocedural dependencies), finding that Gemini 3 Flash offers strong c…

View →

cs.CRcs.AIRecentApr 4, 2026

SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization

Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more

The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…

View →

cs.CRcs.AIcs.SERecentApr 7, 2026

Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code

Dominik Blain, Maxime Noiseux

This study formally verified 3,500 AI-generated code artifacts and found that a majority (55.8%) contain exploitable security vulnerabilities, regardless of the LLM used.

View →