~ similar to 2605.07515v1· 20 results
The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…
The paper proposes a novel, organization-scoped LLM agent runtime architecture designed specifically for regulated cybersecurity operations, ensuring auditable context and integration with existing se…
SOCpilot is a system that verifies the compliance of LLM-drafted incident response plans against mandatory policies and required procedural steps, significantly improving the reliability of AI-assiste…
The paper proposes an LLM-enhanced methodology using RAG to automate the creation of security profiles, ensuring compliance with Ukrainian cybersecurity regulations and international best practices.
LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…
Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari +4 more
RuleForge is an automated system that generates and validates detection rules for web vulnerabilities from structured CVE templates, significantly improving detection accuracy and reducing false posit…
Philip Huff, Dakota Dale, Harshith Guduru, Rohan Singh +1 more
The paper proposes a system that operationalizes cybersecurity governance frameworks by integrating them with attack-path modeling and Deep Reinforcement Learning to generate practical, resource-const…
The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
The paper proposes Ablating Safety, a controlled protocol for removing safety alignment from language models, demonstrating that targeted de-alignment can significantly boost security performance whil…
The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…
The study compared the cybersecurity risk assessment capabilities of five popular large language models (LLMs) against human experts, finding that LLMs consistently underestimated risks and require ma…
Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin +2 more
The paper proposes SkillProbe, a multi-agent security auditing framework, demonstrating that high-popularity skills in LLM agent marketplaces are often insecure due to systemic combinatorial risks.
Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…
OpenSOC-AI is a lightweight framework that uses parameter-efficient fine-tuning of a small LLM to automate threat classification and severity assessment from raw security logs, significantly improving…
The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…
LanG is a governance-aware, open-source agentic AI platform that unifies security operations by providing advanced correlation, automated rule generation, and attack reconstruction capabilities.
The paper introduces WIRE, a pipeline for diagnosing live intra-policy rule conflicts in LLM agents by identifying and testing specific rule pairs within a single prompt policy that can co-govern a re…
The paper introduces CYBERMASKQA, a novel privacy-aware benchmark designed to evaluate Large Language Models' ability to perform accurate cybersecurity question answering while simultaneously preservi…
Rishikesh Sahay, Bell Eapen, Weizhi Meng, Md Rasel Al Mamun +4 more
The paper proposes an automated, LLM-enabled threat hunting framework integrated with Splunk to help SOC analysts autonomously monitor evolving threats and prioritize suspicious network traffic.