~ similar to 2604.17256v1· 20 results
The paper conducts an empirical evaluation of automated vulnerability detection tools across multiple software ecosystems using a curated ground-truth dataset derived from OSV, highlighting systematic…
The paper introduces NICE, a declarative framework that uses NixOS to build and automatically validate reproducible environments for demonstrating software vulnerabilities (CVEs), thereby improving th…
Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari +4 more
RuleForge is an automated system that generates and validates detection rules for web vulnerabilities from structured CVE templates, significantly improving detection accuracy and reducing false posit…
Andrew Hamara, Dwight Horne, Aldehir Rojas, Timothy Kurniawan +4 more
SHIELDS is a multi-agent system that uses LLMs to automate OS hardening by iteratively proposing and refining fixes based on real-time system feedback, achieving up to 73% remediation success.
The paper introduces MCP Pitfall Lab, a comprehensive security testing framework that rigorously assesses and validates developer pitfalls in Model Context Protocol (MCP) tool servers under realistic…
Xiaochong Jiang, Shiqi Yang, Ziwei Li, Lifei Liu +2 more
ChainCaps introduces a novel runtime capability budgeting system that prevents 'permission laundering' in complex tool-using agents, significantly reducing attack success rates while maintaining benig…
VulGD is a dynamic, open-access graph database that aggregates cybersecurity data from multiple sources and uses LLM embeddings to improve vulnerability representation and risk assessment.
This paper systematically evaluates modern security logging standards (CIM, OCSF, ECS) using a novel framework to quantify their detection efficacy across diverse exploit scenarios, revealing critical…
The paper introduces CrossCommitVuln-Bench, a benchmark dataset demonstrating that many real-world Python vulnerabilities are introduced across multiple commits, making them invisible to standard per-…
Chengyan Ma, Jieke Shi, Ruidong Han, Ye Liu +2 more
The paper introduces SymTEE, an LLM-assisted symbolic execution framework that detects missing input validation vulnerabilities in TEE applications without needing complex, real TEE setups.
The paper introduces a provenance-aware vulnerability analysis approach that accurately identifies cross-ecosystem vulnerabilities in Python applications by resolving vendored native libraries to spec…
Zirui Chen, Qi Zhan, Jiayuan Zhou, Xing Hu +2 more
This paper conducts a large-scale empirical study demonstrating that Java library exploits can accurately identify affected versions, achieving high recall and precision, and proposes strategies for e…
The paper analyzes a large dataset of JavaScript packages to demonstrate that a small number of vulnerable dependencies can propagate vulnerabilities across a disproportionately large number of packag…
Yiheng Huang, Zhijia Zhao, Bihuan Chen, Susheng Wu +4 more
This paper introduces a component-centric framework and a novel detector, Connor, to understand and detect sophisticated, multi-component attacks targeting the Model Context Protocol (MCP) servers.
The paper introduces an open-source security framework that significantly improves cloud infrastructure security assessment by unifying identity and resource data, reducing false positives, and automa…
The paper introduces SecLens-R, a multi-stakeholder evaluation framework, demonstrating that LLM performance for vulnerability detection varies significantly depending on the specific priorities (e.g.…
The paper introduces Heimdall, an automated pipeline that uses LLMs and formal verification to safely and automatically migrate legacy, potentially buggy eBPF programs written in C to memory-safe Rust…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…
This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…