~ similar to 2604.01554v1· 20 results
The paper proposes a Residual Risk Scoring (RRS) framework that uses combined semantic and structural similarity analysis to estimate potential residual security risks in code after patching, finding…
Saastha Vasan, Yuzhou Nie, Kaie Chen, Yigitcan Kaya +5 more
MalwarePT introduces a novel binary-level foundation model, pretrained on Windows PE code-section bytes using a ModernBERT-style encoder, demonstrating superior transfer learning capabilities across v…
The paper proposes and evaluates a novel embedding model for bidirectional function association between source code and decompiled/stripped code, significantly outperforming existing models.
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
This paper quantifies the polymorphic capacity of a commercial LLM, demonstrating that it can cheaply generate large populations of structurally diverse, yet behaviorally equivalent, offensive code pa…
The paper introduces Heimdall, an automated pipeline that uses LLMs and formal verification to safely and automatically migrate legacy, potentially buggy eBPF programs written in C to memory-safe Rust…
The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…
Zijun Feng, Yuming Feng, Yu Wang, Weizhe Zhang +3 more
GoAT-X introduces a novel framework that structures cross-chain smart contract auditing as a Graph of Auditing Thoughts, significantly improving the detection of complex, semantic vulnerabilities in m…
FunFuzz introduces a multi-island evolutionary fuzzing framework that uses LLMs to generate structured inputs, achieving superior compiler coverage and discovering more unique failures compared to exi…
The paper introduces PeAR, a static binary rewriting framework that proves static binary instrumentation (SBI) is a practical and effective alternative to dynamic binary instrumentation (DBI) for high…
Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more
The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.
AsmRAG is a novel framework that improves malware detection by treating it as an evidence-based retrieval task using a code-specialized LLM, achieving high accuracy while providing transparent forensi…
The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…
The paper introduces REBench, a comprehensive, standardized benchmark dataset designed to enable fair and rigorous evaluation of Large Language Models (LLMs) on complex binary reverse engineering task…
The paper introduces BOUNDARY FLOW, an LLVM-based framework that enhances kernel fuzzing and analysis by extracting per-task, state-aware data-flow information (arguments and return values) at functio…
The paper introduces FVSpec, a large-scale benchmark that translates thousands of real-world Python property-based tests into formal Lean 4 specifications to evaluate AI models for formal software ver…
This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.
The paper introduces Phoenix, a training-free multi-agent framework that detects code vulnerabilities by synthesizing project-specific behavioral contracts, significantly outperforming existing method…
This paper proposes a lightweight, fast vulnerability detection pipeline for C/C++ code using simple token n-grams and basic code metrics, achieving a PR-AUC of 0.642 on random splits but showing limi…
Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more
The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…