~ similar to 2604.24398v1· 20 results
The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…
Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more
The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…
The paper introduces CrossCommitVuln-Bench, a benchmark dataset demonstrating that many real-world Python vulnerabilities are introduced across multiple commits, making them invisible to standard per-…
The paper introduces Patch2Vuln, a pipeline that uses an LLM agent to reconstruct security vulnerabilities by analyzing differences between old and new Linux binary packages, successfully localizing p…
Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more
VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…
Ze Sheng, Zhicheng Chen, Qingxiao Xu, Kewen Zhu +1 more
FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function depe…
QASecClaw, a multi-agent LLM system, significantly improves the accuracy of Static Application Security Testing (SAST) by using specialized LLM agents to filter out false positives, achieving an F1 sc…
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
The paper demonstrates that security patch detection models trained solely on publicly reported vulnerabilities (NVD) perform poorly when tested on real-world, unreported 'in-the-wild' patches, sugges…
The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…
The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.
The paper introduces LLMVD.js, a multi-stage LLM agent pipeline that effectively detects and confirms taint-style vulnerabilities in Node.js packages, achieving significantly higher confirmation rates…
AgenticVM is a multi-agent framework that uses LLMs and specialized tools to automate and drastically reduce the volume of software vulnerabilities into actionable, prioritized queues.
Zi Liang, Qipeng Xie, Jun He, Bohuan Xue +6 more
The paper introduces Argus, a novel multi-agent framework that reorchestrates Static Application Security Testing (SAST) by integrating LLMs with existing tools to achieve superior, reliable, and cost…
Tian Dong, Yanjun Chen, Shoufeng Zhang, Huaien Zhang +5 more
This paper measures the prevalence of recurring vulnerability patterns (variants) across multiple AI infrastructure repositories and proposes INFRASCOPE, a framework to automatically detect these vari…
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen +3 more
The paper introduces AgentFlow, a novel framework that uses a typed graph DSL and feedback-driven optimization to automatically synthesize and improve multi-agent harnesses for discovering security vu…
This paper systematically surveys adaptive and AI-augmented security testing, concluding that a major gap exists—structural-adaptive fragmentation—where current systems fail to integrate structural pr…
The paper introduces Phoenix, a training-free multi-agent framework that detects code vulnerabilities by synthesizing project-specific behavioral contracts, significantly outperforming existing method…
Shenao Wang, Xinyi Hou, Zhao Liu, Yanjie Zhao +4 more
This paper introduces Agentic Workflow Injection (AWI), a new class of vulnerability in LLM-powered GitHub Actions, and presents TaintAWI, a novel taint-analysis tool that identifies hundreds of explo…