~ similar to 2605.12170v1· 20 results
The paper presents an approach to automatically generate a large number of diverse and complex cybersecurity scenarios that model enterprise IT systems for training purposes.
The paper proposes Dynamic Cyber Ranges, an advanced cyber range environment using LLM-driven Defender agents to counter the saturation of traditional security benchmarks, demonstrating that these dyn…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
Zhiyuan Li, Jingzheng Wu, Xiang Ling, Xing Cui +1 more
This paper provides the first comprehensive security analysis of the Agent Skills framework, identifying severe structural vulnerabilities that require fundamental architectural changes rather than si…
The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…
The paper proposes an organization-scoped LLM agent runtime architecture designed to provide an auditable, model-agnostic platform for regulated cybersecurity operations, integrating deeply with exist…
The paper proposes a novel, organization-scoped LLM agent runtime architecture designed specifically for regulated cybersecurity operations, ensuring auditable context and integration with existing se…
This paper introduces and evaluates a scalable, reproducible 'CTF as a Service' (CaaS) platform designed to simplify the infrastructure management required for cybersecurity training.
The paper introduces CritBench, a novel framework to evaluate LLM cybersecurity capabilities specifically within IEC 61850 Digital Substation Operational Technology (OT) environments, finding that whi…
Muhammad Bilal, Jon Crowcroft, Ruizhi Wang, Xiaolong Xu +1 more
The paper surveys the use of LLMs for agentic NetOps and AIOps, arguing that operational reliability depends not on the model itself, but on robust surrounding machinery and workflow-centered evaluati…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper proposes a standardized, zonal architecture and an open-source prototype for a dedicated Cyber Range (CR) specifically designed for comprehensive and repeatable Wi-Fi security training and e…
Zelin Zhang, Qi Li, Jie Cao, Lingshuang Liu +1 more
The paper analyzes the escalating security and safety threats posed by generative AI systems as they transition from merely generating content to executing real-world actions via tools and agents, fin…
The paper proposes CyberAId, a hybrid multi-agent system designed to enhance cybersecurity for financial institutions by integrating specialized LLM subagents with existing SIEM/XDR telemetry, address…
The paper proposes a novel, empirical methodology called 'backchaining' to derive and prioritize Loss of Control (LoC) mitigations by analyzing the errors an AI system makes on mission-specific nation…
The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…
This paper demonstrates that using a communication algorithm (CommFormer) with heterogeneous agents significantly improves the speed and performance of multi-agent reinforcement learning for autonomou…
The paper proposes an autonomous red teaming framework combining LLMs and RL to generate sophisticated, multi-stage cyber attack campaigns, demonstrating its necessity for evaluating robust AI-enabled…
Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson +11 more
This paper synthesizes expert knowledge from a workshop to provide a comprehensive framework and best-practice guidelines for developing high-quality reinforcement learning environments for autonomous…
LanG is a governance-aware, open-source agentic AI platform that unifies security operations by providing advanced correlation, automated rule generation, and attack reconstruction capabilities.