Papers similar to 2604.06274v1

~ similar to 2604.06274v1· 20 results

cs.CRcs.AIcs.CLRecentMay 28, 2026

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos +1 more

The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…

View →

cs.CRcs.AIcs.CLRecentMay 28, 2026

An Organization-Scoped LLM Agent Runtime Architecture for Regulated Cybersecurity Operations

George Fatouros, Georgios Makridis, George Kousiouris, John Soldatos +1 more

The paper proposes a novel, organization-scoped LLM agent runtime architecture designed specifically for regulated cybersecurity operations, ensuring auditable context and integration with existing se…

View →

cs.CRcs.AIRecentApr 11, 2026

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

Souradip Nath, Chih-Yi Huang, Aditi Ganapathi, Kashyap Thimmaraju +2 more

Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…

View →

cs.CRRecentMay 8, 2026

An Automated Framework for Cybersecurity Policy Compliance Assessment Against Security Control Standards

Bikash Saha, Sandeep Kumar Shukla

The paper introduces PROPARAG, an automated framework that autonomously assesses how well organizational cybersecurity policies comply with standard security controls, achieving high F1 scores on real…

View →

cs.CRRecentMay 6, 2026

Evaluating the Reliability of Multiple Large Language Models in Risk Assessment: A CIS Controls Based Approach

Gustavo Roberto Pinto, Arthur do Prado Labaki, Rodrigo Sanches Miani

The study compared the cybersecurity risk assessment capabilities of five popular large language models (LLMs) against human experts, finding that LLMs consistently underestimated risks and require ma…

View →

cs.CRcs.AIcs.IRRecentApr 30, 2026

Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations

Md Hasan Saju, Akramul Azim

The paper proposes an end-to-end LLM framework that automates SOC operations by integrating ensemble-based threat detection, syntax-constrained query generation, and evidence-grounded incident resolut…

View →

cs.CRcs.AIRecentApr 22, 2026

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

Gustav Keppler, Ghada Elbez, Veit Hagenmeyer

The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…

View →

cs.CRRecentMay 27, 2026

Cybersecurity AI (CAI) Dataset

Víctor Mayoral-Vilches

The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…

View →

cs.CRcs.AIcs.CLRecentApr 2, 2026

RuleForge: Automated Generation and Validation for Web Vulnerability Detection at Scale

Ayush Garg, Sophia Hager, Jacob Montiel, Aditya Tiwari +4 more

RuleForge is an automated system that generates and validates detection rules for web vulnerabilities from structured CVE templates, significantly improving detection accuracy and reducing false posit…

View →

cs.CRRecentApr 23, 2026

A Sociotechnical, Practitioner-Centered Approach to Technology Adoption in Cybersecurity Operations: An LLM Case

Francis Hahn, Mohd Mamoon, Alexandru G. Bardas, Michael Collins +3 more

The paper demonstrates that adopting LLM-based tools in cybersecurity operations requires a sociotechnical, practitioner-centered co-creation approach, which successfully overcame historical adoption…

View →

cs.CRcs.AIRecentMay 23, 2026

CyBOKClaw: Human-in-the-Loop CyBOK Mapping for Cybersecurity Curriculum

Yan Lin Aung, Kevin Togbe

CyBOKClaw is an interpretable human-in-the-loop retrieval framework designed to map broad cybersecurity keywords to the Cyber Security Body of Knowledge (CyBOK), achieving high expert-guided mapping a…

View →

cs.CRcs.AIcs.IRRecentApr 26, 2026

CyberCane: Neuro-Symbolic RAG for Privacy-Preserving Phishing Detection with Formal Ontology Reasoning

Safayat Bin Hakim, Aniqa Afzal, Qi Zhao, Vigna Majmundar +2 more

CyberCane is a neuro-symbolic framework that enhances phishing detection by combining symbolic rule analysis with privacy-preserving RAG and formal ontology reasoning, achieving high recall against AI…

View →

cs.AIcs.CRcs.IRRecentMay 3, 2026

CyberAId: AI-Driven Cybersecurity for Financial Service Providers

George Fatouros, Georgios Makridis, John Soldatos, Dimosthenis Kyriazis +17 more

The paper proposes CyberAId, a hybrid multi-agent system designed to enhance cybersecurity for financial institutions by integrating specialized LLM subagents with existing SIEM/XDR telemetry, address…

View →

cs.CRcs.MARecentApr 15, 2026

SoK: Security of Autonomous LLM Agents in Agentic Commerce

Qian'ang Mao, Jiaxin Wang, Ya Liu, Li Zhu +2 more

The paper develops a unified, cross-layer security framework for autonomous LLM agents operating in agentic commerce, identifying key attack vectors and proposing a layered defense architecture.

View →

cs.CRcs.AIRecentApr 7, 2026

CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments

Gustav Keppler, Moritz Gstür, Veit Hagenmeyer

The paper introduces CritBench, a novel framework to evaluate LLM cybersecurity capabilities specifically within IEC 61850 Digital Substation Operational Technology (OT) environments, finding that whi…

View →

cs.CRcs.AIRecentApr 7, 2026

From Incomplete Architecture to Quantified Risk: Multimodal LLM-Driven Security Assessment for Cyber-Physical Systems

Shaofei Huang, Christopher M. Poskitt, Lwin Khin Shar

The paper introduces ASTRAL, a multimodal LLM-driven framework that reconstructs and analyzes fragmented cyber-physical system architectures to enable comprehensive and quantitative security risk asse…

View →

cs.CRRecentMay 6, 2026

SOCpilot: Verifying Policy Compliance for LLM-Assisted Incident Response

Sidnei Barbieri, Leonardo Vaz de Meneses, Ágney Lopes Roth Ferraz, Lourenço Alves Pereira Júnior

SOCpilot is a system that verifies the compliance of LLM-drafted incident response plans against mandatory policies and required procedural steps, significantly improving the reliability of AI-assiste…

View →

cs.CYcs.AIcs.CRRecentApr 6, 2026

AI Agents Under EU Law

Luca Nannini, Adam Leon Smith, Michele Joshua Maggini, Enrico Panai +5 more

This paper provides a systematic regulatory mapping and compliance architecture for AI agents operating under the complex web of EU laws, concluding that high-risk agents with untraceable behavioral d…

View →

cs.CRcs.AIRecentMar 25, 2026

Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Rishikesh Sahay, Bell Eapen, Weizhi Meng, Md Rasel Al Mamun +4 more

The paper proposes an automated, LLM-enabled threat hunting framework integrated with Splunk to help SOC analysts autonomously monitor evolving threats and prioritize suspicious network traffic.

View →

cs.CRcs.SERecentMay 11, 2026

Usability as a Weapon: Attacking the Safety of LLM-Based Code Generation via Usability Requirements

Yue Li, Xiao Li, Hao Wu, Yue Zhang +4 more

This paper introduces UPAttack, a novel threat model demonstrating that focusing on explicit usability requirements can cause LLMs to generate insecure code by neglecting implicit security constraints…

View →