~ similar to 2605.03158v1· 20 results
Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari +4 more
RuleForge is an automated system that generates and validates detection rules for web vulnerabilities from structured CVE templates, significantly improving detection accuracy and reducing false posit…
The paper introduces CrossCommitVuln-Bench, a benchmark dataset demonstrating that many real-world Python vulnerabilities are introduced across multiple commits, making them invisible to standard per-…
Vincent Koc, Patrick Erichsen, Jacob Tomlinson, Agustin Rivera +2 more
The paper analyzes a dataset of agent skills, demonstrating that different security scanners (VirusTotal, static analysis, SkillSpector) rarely agree, necessitating a layered governance approach for s…
Vincent Koc, Patrick Erichsen, Jacob Tomlinson, Agustin Rivera +2 more
The paper analyzes a dataset of agent skills, demonstrating that different security scanners (VirusTotal, static analysis, SkillSpector) rarely agree on maliciousness, necessitating layered security g…
The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.
VulGD is a dynamic, open-access graph database that aggregates cybersecurity data from multiple sources and uses LLM embeddings to improve vulnerability representation and risk assessment.
The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…
Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more
VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
FixV2W introduces a knowledge graph embedding approach to significantly improve the accuracy of inconsistent CVE-CWE mappings in public vulnerability databases, achieving high prediction rates for exp…
SkillSieve introduces a three-layer hierarchical framework to detect malicious AI agent skills, achieving high F1 scores (0.920) on a large-scale benchmark while maintaining low operational costs.
Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng +6 more
This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and result…
This paper systematically surveys adaptive and AI-augmented security testing, concluding that a major gap exists—structural-adaptive fragmentation—where current systems fail to integrate structural pr…
The paper introduces NICE, a declarative framework that uses NixOS to build and automatically validate reproducible environments for demonstrating software vulnerabilities (CVEs), thereby improving th…
Zhuoran Tan, Wenbo Guo, Taylor Brierley, Jiewen Luo +2 more
The paper introduces SynthChain, a comprehensive, multi-source synthetic testbed and dataset that demonstrates that detecting advanced software supply chain attacks requires fusing evidence from multi…
Aymen Lassoued, Nacef Mbarek, Bechir Dardouri, Bassem Ouni +2 more
The paper introduces VULNSCOUT-C, a compact, specialized transformer model that achieves state-of-the-art performance in C code vulnerability detection while maintaining low inference cost, making it…
The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…
The paper introduces CSTM-Bench, a comprehensive benchmark and evaluation framework demonstrating that standard session-bound AI guardrails fail against sophisticated, cross-session attacks that accum…
Ze Sheng, Zhicheng Chen, Qingxiao Xu, Kewen Zhu +1 more
FuzzingBrain V2 is a multi-agent LLM system that significantly improves automated vulnerability discovery by ensuring all reported bugs are fuzzer-reproducible and handling complex cross-function depe…
The paper introduces a deterministic method to automatically synthesize initial SIEM detection rules (Sigma rules) from attack simulation findings, ensuring full traceability back to the specific orig…