Papers similar to 2605.03956v1

~ similar to 2605.03956v1· 20 results

cs.CRcs.AIcs.LGRecentMay 22, 2026

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

Mohammed Kharma, Ahmed Sabbah, Mohammad Alkhanafseh, Mohammad Hammoudeh +1 more

The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…

View →

cs.SEcs.AIcs.CLRecentApr 13, 2026

AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection

Zijie Zhao, Chenyuan Yang, Weidong Wang, Yihan Yang +2 more

AnyPoC introduces a general multi-agent framework that reliably generates and validates executable Proof-of-Concept (PoC) tests from candidate bug reports, significantly improving automated bug detect…

View →

cs.SEcs.CRRecentMar 27, 2026

A Large-scale Empirical Study on the Generalizability of Disclosed Java Library Vulnerability Exploits

Zirui Chen, Qi Zhan, Jiayuan Zhou, Xing Hu +2 more

This paper conducts a large-scale empirical study demonstrating that Java library exploits can accurately identify affected versions, achieving high recall and precision, and proposes strategies for e…

View →

cs.CRRecentMay 8, 2026

Longitudinal Analyses of SAST Tools: A CodeQL Case Study

Jean-Charles Noirot Ferrand, Kyle Domico, Yohan Beugin, Patrick McDaniel

This study conducts a large-scale longitudinal analysis of CodeQL, finding that while the tool is effective at detecting vulnerabilities, its detection capabilities are not guaranteed to be stable acr…

View →

cs.CRcs.SERecentMay 3, 2026

QASecClaw: A Multi-Agent LLM Approach for False Positive Reduction in Static Application Security Testing

Mohd Ruhul Ameen, Md Takrim Ul Alam, Akif Islam

QASecClaw, a multi-agent LLM system, significantly improves the accuracy of Static Application Security Testing (SAST) by using specialized LLM agents to filter out false positives, achieving an F1 sc…

View →

cs.CRcs.AIcs.SERecentApr 22, 2026

Taint-Style Vulnerability Detection and Confirmation for Node.js Packages Using LLM Agent Reasoning

Ronghao Ni, Mihai Christodorescu, Limin Jia

The paper introduces LLMVD.js, a multi-stage LLM agent pipeline that effectively detects and confirms taint-style vulnerabilities in Node.js packages, achieving significantly higher confirmation rates…

View →

cs.CRcs.CYcs.SERecentApr 15, 2026

Towards Personalizing Secure Programming Education with LLM-Injected Vulnerabilities

Matthew Frazier, Kostadin Damevski

The paper proposes using LLMs to inject personalized security vulnerabilities (CWEs) into students' own code to improve secure programming education, finding that while students found the method engag…

View →

cs.CRcs.LGRecentMay 26, 2026

SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?

Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more

The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.

View →

cs.CRRecentApr 1, 2026

Obfuscating Code Vulnerabilities against Static Analysis in JavaScript Code

Francesco Pagano, Lorenzo Pisu, Leonardo Regano, Davide Maiorca +2 more

This paper empirically demonstrates that current Static Application Security Testing (SAST) tools are fundamentally unreliable against common JavaScript obfuscation techniques, showing that obfuscatio…

View →

cs.CRcs.SERecentApr 7, 2026

Guiding Symbolic Execution with Static Analysis and LLMs for Vulnerability Discovery

Md Shafiuzzaman, Achintya Desai, Wenbo Guo, Tevfik Bultan

SAILOR automates the construction of symbolic execution harnesses by combining static analysis and LLM-based synthesis, significantly improving the scalability and effectiveness of vulnerability disco…

View →

cs.CRcs.AIRecentApr 2, 2026

From Theory to Practice: Code Generation Using LLMs for CAPEC and CWE Frameworks

Murtuza Shahzad, Joseph Wilson, Ibrahim Al Azher, Hamed Alhoori +1 more

The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.

View →

cs.CRcs.AIcs.MARecentApr 20, 2026

RAVEN: Retrieval-Augmented Vulnerability Exploration Network for Memory Corruption Analysis in User Code and Binary Programs

Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more

The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…

View →

cs.CRRecentApr 19, 2026

Original Sin of npm: A Study on Vulnerability Propagation in JavaScript Dependency Networks

Michael Robinson, Sajal Halder, Muhammad Ejaz Ahmed, Muhammad Ikram +2 more

The paper analyzes a large dataset of JavaScript packages to demonstrate that a small number of vulnerable dependencies can propagate vulnerabilities across a disproportionately large number of packag…

View →

cs.SEcs.CRcs.PLRecentApr 29, 2026

Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches

Michael Wienczkowski

This paper systematically surveys adaptive and AI-augmented security testing, concluding that a major gap exists—structural-adaptive fragmentation—where current systems fail to integrate structural pr…

View →

cs.CRRecentMar 30, 2026

VulnScout-C: A Lightweight Transformer for C Code Vulnerability Detection

Aymen Lassoued, Nacef Mbarek, Bechir Dardouri, Bassem Ouni +2 more

The paper introduces VULNSCOUT-C, a compact, specialized transformer model that achieves state-of-the-art performance in C code vulnerability detection while maintaining low inference cost, making it…

View →

cs.CRcs.SERecentApr 5, 2026

LLM-Enabled Open-Source Systems in the Wild: An Empirical Study of Vulnerabilities in GitHub Security Advisories

Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker +1 more

The paper analyzes GitHub security advisories for LLM-integrated open-source systems, finding that while most vulnerabilities map to existing code-level weaknesses, the architectural risks like Supply…

View →

cs.CRRecentApr 8, 2026

PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

Phan The Duy, Khoa Ngo-Khanh, Nguyen Huu Quyen, Van-Hau Pham

PoC-Adapt is an end-to-end framework that significantly improves the reliability and efficiency of automated vulnerability exploitation by integrating semantic state validation and reinforcement learn…

View →

cs.SEcs.CRRecentApr 22, 2026

A Ground-Truth-Based Evaluation of Vulnerability Detection Across Multiple Ecosystems

Peter Mandl, Paul Mandl, Martin Häusl, Maximilian Auch

The paper conducts an empirical evaluation of automated vulnerability detection tools across multiple software ecosystems using a curated ground-truth dataset derived from OSV, highlighting systematic…

View →

cs.CRcs.LGRecentMay 6, 2026

Agentic Vulnerability Reasoning on Windows COM Binaries

Hwiwon Lee, Jongseong Kim, Lingming Zhang

The paper introduces SLYP, an agentic pipeline that significantly improves the discovery of race condition vulnerabilities in Windows COM binaries and autonomously generates verified proof-of-concept…

View →

cs.CRcs.AIcs.SERecentMay 5, 2026

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Jonathan Steinberg, Oren Gal

The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…

View →