~ similar to 2605.24663v1· 20 results
The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…
The paper systematically evaluates advanced retrieval-augmented generation (RAG) architectures for Cyber Threat Intelligence (CTI), demonstrating that a hybrid graph-text approach significantly improv…
The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…
Jiutian Zeng, Junjie Li, Chengwei Dai, Jie Liang +12 more
The paper introduces XekRung, a frontier large language model for cybersecurity, which achieves state-of-the-art performance on domain-specific benchmarks through a comprehensive training and evaluati…
The paper proposes an end-to-end LLM framework that automates SOC operations by integrating ensemble-based threat detection, syntax-constrained query generation, and evidence-grounded incident resolut…
Seonwoo Kim, Jinwoo Kim, Daegyu Kang, Daeseong Kim +1 more
The paper introduces ANCHOR, a schema-agnostic system that constructs knowledge graphs from Cyber Threat Intelligence by dynamically discovering and validating against large ontologies, overcoming lim…
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
Jianan Huang, Rodolfo V. Valentim, Luca Vassio, Matteo Boffa +3 more
The paper proposes a multi-modal contrastive learning framework to improve the generalization of machine learning models in cybersecurity by transferring knowledge from rich textual vulnerability desc…
Liangyi Huang, Zichen Liu, Fei Shao, Shang Ma +4 more
The paper introduces GRID, an end-to-end framework that significantly improves the construction of security knowledge graphs from cyber threat intelligence by replacing unstable LLM-based supervision…
Safayat Bin Hakim, Aniqa Afzal, Qi Zhao, Vigna Majmundar +2 more
CyberCane is a neuro-symbolic framework that enhances phishing detection by combining symbolic rule analysis with privacy-preserving RAG and formal ontology reasoning, achieving high recall against AI…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
OpenSOC-AI is a lightweight framework that uses parameter-efficient fine-tuning of a small LLM to automate threat classification and severity assessment from raw security logs, significantly improving…
The paper proposes an LLM-enhanced methodology using RAG to automate the creation of security profiles, ensuring compliance with Ukrainian cybersecurity regulations and international best practices.
The paper introduces TorchSight, an open-source local system using a fine-tuned Qwen 3.5 27B model that achieves high accuracy (95.0%) in classifying sensitive security documents without relying on ex…
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
The paper introduces RefWalk, a novel framework designed to improve regulatory compliance question answering by ensuring rigorous citation traceability and explicit per-rule attribution across complex…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
The paper introduces WebKnoGraph, an open-source framework for systematically evaluating internal linking strategies on websites by modeling the site as a graph and assessing trade-offs between author…
Jiling Zhou, Aisvarya Adeseye, Seppo Virtanen, Antti Hakkala +1 more
The paper proposes a structured prompt engineering framework to enhance the integrity and reliability of Chain-of-Thought (CoT) reasoning in LLMs, demonstrating significant improvements in security-se…
Yuming Xu, Mingtao Zhang, Zhuohan Ge, Haoyang Li +6 more
This paper proposes a comprehensive taxonomy (SLOT) to systematically categorize security risks, attacks, and defenses specific to Retrieval-Augmented Generation (RAG), clarifying that these risks are…