~ similar to 2604.27000v1· 20 results
ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari +4 more
RuleForge is an automated system that generates and validates detection rules for web vulnerabilities from structured CVE templates, significantly improving detection accuracy and reducing false posit…
The paper proposes using LLMs to inject personalized security vulnerabilities (CWEs) into students' own code to improve secure programming education, finding that while students found the method engag…
QASecClaw, a multi-agent LLM system, significantly improves the accuracy of Static Application Security Testing (SAST) by using specialized LLM agents to filter out false positives, achieving an F1 sc…
The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…
Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more
VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…
Red-MIRROR is a novel multi-agent LLM system that automates complex web penetration testing by integrating a memory-reflection backbone, achieving superior performance on industry benchmarks.
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
Ze Sheng, Zhicheng Chen, Qingxiao Xu, Kewen Zhu +1 more
FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function depe…
Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more
SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…
The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.
The paper investigates how AI coding assistants shift developers' security focus from proactive prevention to reactive review, finding that this structural change is reinforced by current tool interac…
This paper demonstrates that LLM-based security code review systems are highly susceptible to sophisticated, iterative contextual bias attacks, which can successfully reintroduce vulnerabilities.
This study conducts a large-scale longitudinal analysis of CodeQL, finding that while the tool is effective at detecting vulnerabilities, its detection capabilities are not guaranteed to be stable acr…
The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…
Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen +3 more
The paper introduces AgentFlow, a novel framework that uses a typed graph DSL and feedback-driven optimization to automatically synthesize and improve multi-agent harnesses for discovering security vu…
The paper introduces the Mitigation-Aware Chain-of-Thought (MA-CoT) framework, which significantly enhances the security reliability of code generated by LLMs across multiple languages and models.
This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.
SDLLMFuzz is a novel dynamic-static framework that combines LLM-based structure-aware input generation with semantic feedback from crash analysis to significantly improve vulnerability discovery in st…