~ similar to 2605.21773v1· 20 results
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more
The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these honeypots provide substantially longer and harder-to-detect…
Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more
The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these systems provide substantially longer and harder-to-detect i…
The paper proposes an embarrassingly simple detector that monitors model extraction attacks by testing whether the aggregate distribution of incoming LLM queries deviates from the historical distribut…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim +1 more
The paper introduces CyBiasBench, a comprehensive benchmark that quantifies the inherent, agent-specific bias in LLM agents' attack selection patterns in cybersecurity scenarios.
The paper proposes a graph-based framework for detecting attacks in LLM agent tool-call traffic, finding that content-level embeddings are crucial for high accuracy and that tree ensembles on these em…
The paper introduces 'log-substrate prompt injection,' demonstrating that attacker-controlled log fields can be used to manipulate LLM-powered security analysis, with persona hijacking and context man…
This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…
This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…
Samuel Ndichu, Tao Ban, Seiichi Ozawa, Takeshi Takahashi +1 more
NLLog introduces a lightweight system that converts structured security logs into natural language sentences for improved anomaly detection, achieving high performance with low false-positive rates su…
Samuel Ndichu, Tao Ban, Seiichi Ozawa, Takeshi Takahashi +1 more
NLLog is a lightweight pipeline that rewrites system-generated logs into natural language for improved analysis and comprehension.
The paper introduces CYBERMASKQA, a novel privacy-aware benchmark designed to evaluate Large Language Models' ability to perform accurate cybersecurity question answering while simultaneously preservi…
The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…
He Yang Yuan, Xin Wang, Kundi Yao, An Ran Chen +2 more
The paper characterizes logging code security issues and benchmarks LLMs, finding that while LLMs can moderately detect these issues, they struggle significantly with reliably generating correct code…
Vivek Dahiya, Sunny Nehra, Vipul Dholariya, Bhavik Shangari +1 more
The paper evaluates frontier LLMs on cybersecurity tasks using dual-mode benchmarks and concludes that general-purpose models are insufficient, advocating for specialized, vertical foundation models.
This paper introduces an attribution-driven analysis of encoder-based Large Language Models (LLMs) for network intrusion detection, demonstrating that the models make decisions based on meaningful tra…
The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…
The paper introduces an end-to-end framework that not only detects network intrusions using deep learning but also generates actionable, citation-grounded mitigation reports using a Retrieval-Augmente…