~ similar to 2606.05725v1· 20 results
The paper introduces HIDBench, a new benchmark for evaluating LLMs' ability to perform host-based intrusion detection using complex, noisy system logs, finding that model performance degrades signific…
Karima Makhlouf, Lamiaa Basyoni, Syed Khaderi, Gabriel Marquez +3 more
This paper conducts a structured ablation study using a unified threat model to evaluate how various system factors (like model architecture and retrieval configuration) influence different types of p…
The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…
Pei Chen, Geng Hong, Xinyi Wu, Mengying Wu +5 more
This paper systematically analyzes the resilience of LLM-enhanced search engines against black-hat SEO attacks, finding that while they block most traditional attacks, they remain vulnerable to sophis…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…
The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.
The paper demonstrates that relying on strict regular-expression parsing for evaluating LLM-based security log classifiers introduces systematic errors, potentially causing a functional model to appea…
This paper introduces an attribution-driven analysis of encoder-based Large Language Models (LLMs) for network intrusion detection, demonstrating that the models make decisions based on meaningful tra…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
The paper proposes a framework to intentionally evade malware detectors by adding a small number of benign API imports, successfully demonstrating targeted misclassification into a chosen benign categ…
This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…
This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…
The paper introduces $(l, b)$-inextractability, a new formal measure that demonstrates that standard indistinguishability properties are insufficient for guaranteeing protection against data extractio…
This paper demonstrates that current AI model extraction defenses, which assume attacks come from single sources, are easily bypassed by coordinated, distributed threat actors.
The paper proposes a graph-based framework for detecting attacks in LLM agent tool-call traffic, finding that content-level embeddings are crucial for high accuracy and that tree ensembles on these em…
The paper introduces SecureBreak, a manually annotated, safety-oriented dataset designed to help detect harmful outputs from large language models (LLMs) that bypass existing security alignments.
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper proposes a unified closed-loop threat taxonomy to systematically analyze and defend foundation models by explicitly framing the bidirectional security interactions between data and models.
FlowGuard introduces an identity-independent defense using flow matching to detect data-free model stealing attacks by identifying synthetic queries as out-of-distribution based on their lower-dimensi…
The paper introduces the Safety Asymmetry Score (SAS) to measure how a model's vulnerability to adversarial content changes based on whether the malicious input arrives via the user message, tool meta…