~ similar to 2605.09792v1· 20 results
The paper introduces C-MADF, a causally constrained multi-agent framework that significantly reduces false positives in autonomous cyber defense by restricting response actions to structurally consist…
DeepStage is a deep reinforcement learning framework that achieves autonomous, stage-aware defense against multi-stage APT campaigns by fusing graph-based telemetry and predicting attacker stages.
The paper proposes an autonomous red teaming framework combining LLMs and RL to generate sophisticated, multi-stage cyber attack campaigns, demonstrating its necessity for evaluating robust AI-enabled…
The paper models the trade-off between deploying increasingly capable AI systems and managing associated cyber risks, finding a 'deployment paradox' where high-loss environments with weak governance l…
Saeid Jamshidi, Negar Shahabi, Foutse Khomh, Carol Fung +1 more
The paper proposes a two-timescale governance framework using a multi-agent LLM to safely update and guide RL agents for SDN-IoT defense, significantly improving performance and stability under advers…
The paper introduces a queueing-theoretic framework to model dynamic cyber-attack surfaces, developing an adaptive reinforcement learning defense policy that significantly reduces active vulnerabiliti…
The paper proposes a novel, empirical methodology called 'backchaining' to derive and prioritize Loss of Control (LoC) mitigations by analyzing the errors an AI system makes on mission-specific nation…
The paper introduces STRIATUM-CTF, a modular agentic framework that uses a standardized context protocol to enable LLMs to perform multi-step, stateful reasoning for general-purpose CTF solving, achie…
DeepXplain introduces an explainable deep reinforcement learning framework that enhances the trustworthiness and effectiveness of autonomous cyber defense against multi-stage APT campaigns by integrat…
LanG is a governance-aware, open-source agentic AI platform that unifies security operations by providing advanced correlation, automated rule generation, and attack reconstruction capabilities.
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
The paper evaluates Language Model Agents (LMAs) for red-teaming by benchmarking their ability to perform lateral movement, finding that expert-defined action plans are most effective, though all moda…
The paper introduces NASimJax, a GPU-accelerated framework that significantly speeds up network simulation for reinforcement learning, enabling large-scale, realistic training for penetration testing.
Kerri Prinos, Lilianne Brush, Cameron Denton, Zhanqi Wang +4 more
The paper proposes a tool-mediated LLM architecture for autonomous cyber defense, formally proving its stability and demonstrating that it significantly reduces an attacker's expected payoff in real-w…
The paper proposes a declarative, autonomous, self-protecting framework for securing complex 5G/6G networks by leveraging a standardized security ontology and automated graph reasoning to neutralize l…
The Pen-Strategist framework addresses the limitations of current automated penetration testing by integrating a novel domain-specific reasoning model for robust strategy formation and an accurate cla…
The paper introduces STRIDE-AI, a novel threat modeling framework that adapts classical STRIDE for generative AI, successfully reducing the attack success rate of a tested LLM chatbot from 80% to 15%.
This paper proposes a gap-prioritization framework to bridge the gap between theoretical cyber attack prediction research and practical operational deployment by identifying critical implementation hu…
Rishikesh Sahay, Bell Eapen, Weizhi Meng, Md Rasel Al Mamun +4 more
The paper proposes an automated, LLM-enabled threat hunting framework integrated with Splunk to help SOC analysts autonomously monitor evolving threats and prioritize suspicious network traffic.
The paper introduces PROPARAG, an automated framework that autonomously assesses how well organizational cybersecurity policies comply with standard security controls, achieving high F1 scores on real…