~ similar to 2604.11950v1· 20 results
The paper proposes agentic fuzzing, a novel bug-finding approach where deep agents perform direct reasoning based on historical bugs to discover logic bugs in mature codebases.
Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more
The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.
Shravya Kanchi, Xiaoyan Zang, Ying Zhang, Danfeng Yao +1 more
The paper introduces PoVSmith, an agent-based system that uses large language models and call path analysis to automatically generate and assess proof-of-vulnerability tests, significantly improving t…
The paper introduces SLYP, an agentic pipeline that significantly improves the discovery of race condition vulnerabilities in Windows COM binaries and autonomously generates verified proof-of-concept…
PoC-Adapt is an end-to-end framework that significantly improves the reliability and efficiency of automated vulnerability exploitation by integrating semantic state validation and reinforcement learn…
This study formally verified 3,500 AI-generated code artifacts and found that a majority (55.8%) contain exploitable security vulnerabilities, regardless of the LLM used.
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
QASecClaw, a multi-agent LLM system, significantly improves the accuracy of Static Application Security Testing (SAST) by using specialized LLM agents to filter out false positives, achieving an F1 sc…
Xiang Liu, Sa Song, Zhaowei Zhang, Huiying Lan +5 more
The paper introduces Agora, a domain-aware multi-agent framework that successfully detects deep, previously unknown logic bugs in complex consensus protocols, outperforming existing LLM-based analysis…
The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…
The paper introduces a novel multi-LLM orchestration system combined with symbolic execution to successfully detect memory vulnerabilities in uncompilable, incomplete Rust CVE code snippets, achieving…
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
The paper introduces SPECA, an LLM-driven framework that audits distributed protocols by deriving and enforcing security properties from natural-language specifications, enabling cross-implementation…
Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more
SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…
The paper introduces SafeAudit, a meta-audit framework that systematically enumerates test cases and uses a quantitative metric to uncover significant residual unsafe behaviors in LLM agents that exis…
The paper introduces Refute-or-Promote, an adversarial multi-agent review system that significantly improves the precision of LLM-assisted defect discovery by filtering out false positives.
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
VeriCWEty proposes an embedding-based framework to detect and classify common software vulnerabilities (CWEs) in Verilog RTL code at both module and line levels, achieving high detection accuracy.
SAILOR automates the construction of symbolic execution harnesses by combining static analysis and LLM-based synthesis, significantly improving the scalability and effectiveness of vulnerability disco…
Ze Sheng, Zhicheng Chen, Qingxiao Xu, Kewen Zhu +1 more
FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function depe…