~ similar to 2605.13138v1· 20 results
The paper introduces CrossCommitVuln-Bench, a benchmark dataset demonstrating that many real-world Python vulnerabilities are introduced across multiple commits, making them invisible to standard per-…
The paper demonstrates that security patch detection models trained solely on publicly reported vulnerabilities (NVD) perform poorly when tested on real-world, unreported 'in-the-wild' patches, sugges…
Sicong Cao, Jinxuan Xu, Le Yu, Jing Yang +3 more
The paper proposes MAS-SZZ, a multi-agentic algorithm that significantly improves the identification of the earliest commit introducing a software vulnerability by combining root cause analysis with s…
This study conducts a large-scale longitudinal analysis of CodeQL, finding that while the tool is effective at detecting vulnerabilities, its detection capabilities are not guaranteed to be stable acr…
Aymen Lassoued, Nacef Mbarek, Bechir Dardouri, Bassem Ouni +2 more
The paper introduces VULNSCOUT-C, a compact, specialized transformer model that achieves state-of-the-art performance in C code vulnerability detection while maintaining low inference cost, making it…
The paper proposes a Residual Risk Scoring (RRS) framework that uses combined semantic and structural similarity analysis to estimate potential residual security risks in code after patching, finding…
This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.
The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
The paper introduces LCC-LLM, a code-centric framework and dataset that significantly improves the reliability of malware attribution and static analysis by grounding LLM reasoning in comprehensive, m…
Zirui Chen, Qi Zhan, Jiayuan Zhou, Xing Hu +2 more
This paper conducts a large-scale empirical study demonstrating that Java library exploits can accurately identify affected versions, achieving high recall and precision, and proposes strategies for e…
Tian Dong, Yanjun Chen, Shoufeng Zhang, Huaien Zhang +5 more
This paper measures the prevalence of recurring vulnerability patterns (variants) across multiple AI infrastructure repositories and proposes INFRASCOPE, a framework to automatically detect these vari…
The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…
Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong +6 more
The paper introduces LogicEval, a systematic framework and dataset (LogicDS) to evaluate automated repair techniques for logical software vulnerabilities, finding that prompt sensitivity and context l…
Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more
The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.
VulStyle introduces a multi-modal model that jointly encodes source code, non-terminal AST structure, and code stylometry features to achieve state-of-the-art performance in software vulnerability det…
VulKey introduces a novel LLM-based framework that uses a hierarchical abstraction of expert security knowledge to guide automatic vulnerability repair, achieving state-of-the-art performance on real-…
The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…