Papers similar to 2606.03386v1

~ similar to 2606.03386v1· 20 results

cs.CRcs.AIcs.CLRecentMar 25, 2026

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.

View →

cs.CRcs.AIRecentApr 21, 2026

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar

The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

Ahmed Sabbah, Mohammed Kharma, Radi Jarrar, Samer Zein +1 more

This study longitudinally evaluates the adversarial robustness of Android malware detection systems over a decade, finding that temporal separation significantly degrades robustness due to concept dri…

View →

cs.CRRecentMay 18, 2026

From Detection to Response: A Deep Learning and Retrieval-Augmented Generation Framework for Network Intrusion Mitigation

Md Navid Bin Islam, Sajal Saha, Senior Member

The paper introduces an end-to-end framework that not only detects network intrusions using deep learning but also generates actionable, citation-grounded mitigation reports using a Retrieval-Augmente…

View →

cs.CRcs.AIcs.LGEmpiricalRecentJun 12, 2026

AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges

Fengyu Liu, Jiarun Dai, Yihe Fan, Wuyuao Mai +10 more

The paper introduces AgentCyberRange, an open, multi-range infrastructure for measuring autonomous cyber attack capability in realistic cyber ranges, and evaluates six frontier AI systems.

View →

cs.CRRecentApr 17, 2026

Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

Cedric Bonhomme, Alexandre Dulaunoy

The paper investigates forecasting sparse and bursty vulnerability sightings, concluding that traditional time-series models like SARIMAX are inadequate, and count-based methods like Poisson regressio…

View →

cs.CRcs.AIRecentMay 8, 2026

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more

The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.

View →

cs.CEcs.AIcs.CRRecentApr 8, 2026

SentinelSphere: Integrating AI-Powered Real-Time Threat Detection with Cybersecurity Awareness Training

Nikolaos D. Tantaroudas, Ilias Karachalios, Andrew J. McCracken

SentinelSphere is an AI platform that integrates advanced deep learning for real-time threat detection with an LLM-powered training system to holistically address both technical and human-factor cyber…

View →

cs.CRcs.CLRecentMay 14, 2026

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

Karthik Raghu Iyer, Yazdan Jamshidi, Nicholas Bray, Alexey A. Shvets

The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…

View →

cs.CRcs.MARecentJun 4, 2026

ZERO-APT: A Closed-Loop Adversarial Framework for LLM-Driven Automated Penetration Testing under Intelligent Defense

Anlan Zheng, Tiantian Zhu

ZERO-APT introduces a novel closed-loop adversarial framework for automated penetration testing that simulates attacks against an intelligent, real-time defending system, achieving a high attack succe…

View →

cs.CRRecentMay 27, 2026

Cybersecurity AI (CAI) Dataset

Víctor Mayoral-Vilches

The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…

View →

cs.CRRecentMay 4, 2026

Zero Day Attacks: Novel Behaviour or Novel Vulnerability?

Nnamdi Jibunoh, Sara Khanchi, Adetokunbo Makanju

The paper argues that zero-day attacks primarily exploit undisclosed vulnerabilities rather than exhibiting novel behaviors, advocating for vulnerability-centric detection methods over purely behavior…

View →

cs.CRcs.AIRecentMay 16, 2026

STRIDE-AI: A Threat Modeling Framework for Generative AI Security Assessment

Tsafac Nkombong Regine Cyrille, Franziska Schwarz

The paper introduces STRIDE-AI, a novel threat modeling framework that adapts classical STRIDE for generative AI, successfully reducing the attack success rate of a tested LLM chatbot from 80% to 15%.

View →

cs.CRRecentMay 10, 2026

Operationalizing Cybersecurity Governance for Mitigation Planning with Attack-Path Modeling and Reinforcement Learning

Philip Huff, Dakota Dale, Harshith Guduru, Rohan Singh +1 more

The paper proposes a system that operationalizes cybersecurity governance frameworks by integrating them with attack-path modeling and Deep Reinforcement Learning to generate practical, resource-const…

View →

cs.CRcs.AIRecentJun 4, 2026

GenTI: Benchmarking LLMs for Autonomous IDPS Rule Generation for Unseen Attacks

Hassan Jalil Hadi, Rehana Yasmin, Ali Shoker

The paper introduces GenTI, a novel LLM-driven benchmark and dataset, to automatically generate high-quality, deployable IDPS rules for detecting unseen and zero-day cyber attacks.

View →

cs.CRcs.AIRecentMay 28, 2026

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Galip Tolga Erdem

This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…

View →

cs.CRcs.AIRecentMay 28, 2026

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Galip Tolga Erdem

This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…

View →

cs.CRcs.AIEmpiricalRecentJul 22, 2026

Small, Free, and Effective: Orchestrating Open-Weight Small Language Models to Outperform Single LLM for Malware Analysis

Adel ElZemity, Shujun Li, Budi Arief

This paper investigates the performance of orchestrated ensembles of small language models in analyzing malware detonation reports, surpassing the capabilities of single large language models and open…

View →

cs.CRcs.AIRecentMar 20, 2026

Improving Generalization on Cybersecurity Tasks with Multi-Modal Contrastive Learning

Jianan Huang, Rodolfo V. Valentim, Luca Vassio, Matteo Boffa +3 more

The paper proposes a multi-modal contrastive learning framework to improve the generalization of machine learning models in cybersecurity by transferring knowledge from rich textual vulnerability desc…

View →