Papers similar to 2605.07515v1

~ similar to 2605.07515v1· 20 results

cs.CRcs.AIcs.CLRecentMay 28, 2026

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos +1 more

The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos +1 more

The paper proposes a novel, organization-scoped LLM agent runtime architecture designed specifically for regulated cybersecurity operations, ensuring auditable context and integration with existing se…

View →

cs.CRRecentMay 6, 2026

SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

Sidnei Barbieri, Leonardo Vaz de Meneses, Ágney Lopes Roth Ferraz, Lourenço Alves Pereira Júnior

SOCpilot is a system that verifies the compliance of LLM-drafted incident response plans against mandatory policies and required procedural steps, significantly improving the reliability of AI-assiste…

View →

cs.CRcs.AIRecentApr 7, 2026

Towards the Development of an LLM-Based Methodology for Automated Security Profiling in Compliance with Ukrainian Cybersecurity Regulations

Daniil Shafranskyi, Iryna Stopochkina, Mykola Ilin

The paper proposes an LLM-enhanced methodology using RAG to automate the creation of security profiles, ensuring compliance with Ukrainian cybersecurity regulations and international best practices.

View →

cs.AIRecentMay 29, 2026

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

Tom Lucas, Alessio Buscemi, Alfredo Capozucca, German Castignani +1 more

LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…

View →

cs.CRcs.AIcs.CLRecentApr 2, 2026

RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari +4 more

RuleForge is an automated system that generates and validates detection rules for web vulnerabilities from structured CVE templates, significantly improving detection accuracy and reducing false posit…

View →

cs.CRRecentMay 10, 2026

Operationalizing Cybersecurity Governance for Mitigation Planning with Attack-Path Modeling and Reinforcement Learning

Philip Huff, Dakota Dale, Harshith Guduru, Rohan Singh +1 more

The paper proposes a system that operationalizes cybersecurity governance frameworks by integrating them with attack-path modeling and Deep Reinforcement Learning to generate practical, resource-const…

View →

cs.CRcs.AIRecentApr 22, 2026

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

Gustav Keppler, Ghada Elbez, Veit Hagenmeyer

The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…

View →

cs.CRcs.AIRecentApr 21, 2026

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…

View →

cs.CRcs.AIRecentMay 17, 2026

Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications

Isaac David, Arthur Gervais

The paper proposes Ablating Safety, a controlled protocol for removing safety alignment from language models, demonstrating that targeted de-alignment can significantly boost security performance whil…

View →

cs.CRcs.CLRecentMay 14, 2026

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets

The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…

View →

cs.CRRecentMay 6, 2026

Evaluating the Reliability of Multiple Large Language Models in Risk Assessment: A CIS Controls Based Approach

Gustavo Roberto Pinto, Arthur do Prado Labaki, Rodrigo Sanches Miani

The study compared the cybersecurity risk assessment capabilities of five popular large language models (LLMs) against human experts, finding that LLMs consistently underestimated risks and require ma…

View →

cs.CRcs.SERecentMar 22, 2026

SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration

Zihan Guo, Zhiyu Chen, Xiaohang Nie, Jianghao Lin +2 more

The paper proposes SkillProbe, a multi-agent security auditing framework, demonstrating that high-popularity skills in LLM agent marketplaces are often insecure due to systemic combinatorial risks.

View →

cs.CRcs.AIRecentApr 11, 2026

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

Souradip Nath, Chih-Yi Huang, Aditi Ganapathi, Kashyap Thimmaraju +2 more

Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…

View →

cs.CRRecentApr 29, 2026

OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis

Chaitanya Vilas Garware, Sharif Noor Zisad

OpenSOC-AI is a lightweight framework that uses parameter-efficient fine-tuning of a small LLM to automate threat classification and severity assessment from raw security logs, significantly improving…

View →

cs.CRRecentMay 27, 2026

Cybersecurity AI (CAI) Dataset

Víctor Mayoral-Vilches

The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…

View →

cs.CRcs.AIRecentApr 7, 2026

LanG -- A Governance-Aware Agentic AI Platform for Unified Security Operations

Anes Abdennebi, Nadjia Kara, Laaziz Lahlou, Hakima Ould-Slimane

LanG is a governance-aware, open-source agentic AI platform that unifies security operations by providing advanced correlation, automated rule generation, and attack reconstruction capabilities.

View →

cs.AIRecentMay 27, 2026

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

Lu Yan, Xuan Chen, Xiangyu Zhang

The paper introduces WIRE, a pipeline for diagnosing live intra-policy rule conflicts in LLM agents by identifying and testing specific rule pairs within a single prompt policy that can co-govern a re…

View →

cs.CRcs.LGRecentMay 23, 2026

CyberMaskQA: A Privacy-Aware Benchmark for Evaluating Large Language Models in Cybersecurity Question Answering

Matilda Gaddi, Jin Noh, Onat Gungor, Tajana Rosing

The paper introduces CYBERMASKQA, a novel privacy-aware benchmark designed to evaluate Large Language Models' ability to perform accurate cybersecurity question answering while simultaneously preservi…

View →

cs.CRcs.AIRecentMar 25, 2026

Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Rishikesh Sahay, Bell Eapen, Weizhi Meng, Md Rasel Al Mamun +4 more

The paper proposes an automated, LLM-enabled threat hunting framework integrated with Splunk to help SOC analysts autonomously monitor evolving threats and prioritize suspicious network traffic.

View →