~ similar to 2603.18740v2· 20 results
The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…
The paper introduces False Security Confidence (FSC), a new metric to measure the inherent prevalence of security vulnerabilities in code generated by LLMs that are otherwise functionally correct, eve…
This paper systematically surveys adaptive and AI-augmented security testing, concluding that a major gap exists—structural-adaptive fragmentation—where current systems fail to integrate structural pr…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Nils Loose, Joseph Bienhüls, Kristoffer Hempel, Felix Mächtle +1 more
The paper evaluates code language model-based detection of vulnerability-fixing commits (VFCs) using a unified benchmark and concludes that code changes alone are insufficient for accurate detection,…
QASecClaw, a multi-agent LLM system, significantly improves the accuracy of Static Application Security Testing (SAST) by using specialized LLM agents to filter out false positives, achieving an F1 sc…
Maofei Chen, Laifu Wang, Yue Qin, Yuan Wang +2 more
The paper demonstrates that using raw source text for fine-tuning LLMs on vulnerability detection causes high false-positive rates by memorizing surface-level syntax, a problem mitigated by using Abst…
Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker +1 more
The paper analyzes GitHub security advisories for LLM-integrated open-source systems, finding that while most vulnerabilities map to existing code-level weaknesses, the architectural risks like Supply…
The paper introduces LCC-LLM, a code-centric framework and dataset that significantly improves the reliability of malware attribution and static analysis by grounding LLM reasoning in comprehensive, m…
Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more
VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…
The paper empirically analyzes the susceptibility of seven widely used AI-assisted development tools (MCP clients) to prompt injection via tool-poisoning, revealing significant disparities in their se…
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
This paper empirically evaluates the security of code generated by seven popular LLMs and finds that all evaluated models generate code containing critical or high-severity vulnerabilities.
Shihao Weng, Yang Feng, Jinrui Zhang, Xiaofei Xie +2 more
The paper introduces ARGUS, a defense mechanism that uses provenance-aware decision auditing to protect LLM agents from sophisticated, context-aware prompt injection attacks, significantly reducing th…
The paper proposes an architectural proxy (MCP) to enforce robust, reliable tool access control for LLM agents, demonstrating that this structural enforcement is necessary because prompt-based restric…
The paper systematically maps LLM agent vulnerabilities by testing 10,000 prompt variations, finding that 'goal reframing' language is the primary trigger for exploitation, rather than broad adversari…
Kevin Lira, Baldoino Fonseca, Davy Baía, Márcio Ribeiro +1 more
This study assesses the effectiveness and cost of four modern LLMs in detecting vulnerabilities that span multiple functions (interprocedural dependencies), finding that Gemini 3 Flash offers strong c…
The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…
The paper introduces Refute-or-Promote, an adversarial multi-agent review system that significantly improves the precision of LLM-assisted defect discovery by filtering out false positives.