~ similar to 2605.25836v1· 20 results
Pei-Yu Tseng, Lan Zhang, ZihDwo Yeh, Xiaoyan Sun +2 more
The paper introduces IOCRegex-gen, an automated LLM-based system that converts Indicators of Compromise (IOCs) into syntactically and semantically correct regular expressions, achieving high accuracy…
The paper introduces a comprehensive taxonomy and auditing framework to assess the collective coverage of existing LLM attack benchmarks, revealing significant and systematic gaps in current testing m…
AutoVerifier is an LLM-based agentic framework that automates the end-to-end verification of complex technical claims, enabling non-experts to generate evidence-backed intelligence assessments.
Haobo Zhang, Zhenhua Xu, Junxian Li, Shangfeng Sheng +2 more
AttnDiff introduces a data-efficient white-box framework that extracts intrinsic attention-based fingerprints to verify the provenance and detect unauthorized derivation of large language models (LLMs…
The paper introduces GenTI, a novel LLM-driven benchmark and dataset, to automatically generate high-quality, deployable IDPS rules for detecting unseen and zero-day cyber attacks.
The paper introduces the CAI Dataset, a massive, multi-terabyte corpus of real-world, hands-on cybersecurity LLM trajectories, designed to address the performance bottleneck caused by expert operator…
The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper systematically evaluates advanced retrieval-augmented generation (RAG) architectures for Cyber Threat Intelligence (CTI), demonstrating that a hybrid graph-text approach significantly improv…
The paper introduces a challenging benchmark for LLM agents to perform unsupervised threat hunting on raw Windows event logs, finding that current frontier models perform poorly and are not ready for…
COBALT-TLA introduces a neuro-symbolic verification loop that successfully and autonomously discovers novel cross-chain bridge vulnerabilities by integrating an LLM with the TLA+ model checker.
The paper proposes an embarrassingly simple detector that monitors model extraction attacks by testing whether the aggregate distribution of incoming LLM queries deviates from the historical distribut…
The paper introduces Trace, a forensic framework that fingerprints the model family of autonomous AI attack agents using terminal behavior, enabling subsequent prompt injection to extract system promp…
The paper demonstrates that relying on strict regular-expression parsing for evaluating LLM-based security log classifiers introduces systematic errors, potentially causing a functional model to appea…
Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi, Gabriel Marquez +3 more
This paper conducts a structured ablation study using a unified threat model to evaluate how various system factors (like model architecture and retrieval configuration) influence different types of p…
The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…
The paper introduces OrgForge-IT, a novel, verifiable synthetic benchmark that guarantees cross-artifact consistency for testing LLM-based insider threat detection, revealing critical performance gaps…
The paper introduces CyberCertBench, a new benchmark suite for evaluating LLMs against industry cybersecurity certifications, finding that while frontier models perform well on general knowledge, thei…
The paper introduces Critical-CoT, a novel two-stage fine-tuning defense framework that equips LLMs with critical thinking abilities to detect and reject malicious reasoning steps introduced by advanc…
Sen Fang, Weiyuan Ding, Zhezhen Cao, Zhou Yang +1 more
AEGIS is a novel multi-agent framework that grounds vulnerability reasoning by reconstructing per-variable dependency chains over a Code Property Graph, achieving state-of-the-art performance on the P…