~ similar to 2605.03619v2· 20 results
This paper proposes a structured pipeline using LLMs to generate and evaluate obfuscated XSS payloads, demonstrating that while LLMs can generate samples, they currently struggle to ensure payloads ma…
The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.
The paper introduces uGen, the first LLM-driven framework that uses a retrieval-augmented, multi-agent design to automatically generate functionally correct microarchitectural attack Proof-of-Concepts…
This paper addresses the lack of research on adversarial malware generation for Linux ELF binaries by developing a new semantic-preserving generator that achieves a high evasion rate against modern de…
Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora +3 more
The paper introduces CodeScan, a novel black-box framework that detects data poisoning in code generation LLMs by analyzing structural similarities across multiple generations to identify recurring, v…
AsmRAG is a novel framework that improves malware detection by treating it as an evidence-based retrieval task using a code-specialized LLM, achieving high accuracy while providing transparent forensi…
This study empirically measures the consistency and success rate of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation capabilit…
This study empirically measures the consistency and effectiveness of autonomous LLM penetration testing across multiple services, finding statistically significant differences in exploitation rates am…
Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more
SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
The paper introduces RedShell, a generative AI tool designed to help ethical hackers generate syntactically and semantically valid malicious PowerShell code, addressing the challenge of data scarcity…
The paper introduces RedShell, a hardware-efficient framework that uses fine-tuned LLMs to automate the generation of syntactically and semantically valid offensive PowerShell code for pentesting.
Yifei Wang, Tianlin Li, Xiaohan Zhang, Yida Yang +2 more
This paper introduces a novel class of backdoor attacks that exploit the numerical side effects of LLM inference optimization, achieving high success rates while maintaining clean accuracy.
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
The paper introduces Trident, a novel malware detection system that combines static features, LLM-derived behavioral rules, and direct LLM analysis to achieve superior robustness against concept drift…
Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more
The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these honeypots provide substantially longer and harder-to-detect…
Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov +5 more
The paper introduces Honeyval, a comprehensive evaluation framework, to rigorously test LLM-powered HTTP honeypots, demonstrating that these systems provide substantially longer and harder-to-detect i…
This paper analyzes large-scale reasoning traces from LLM-based binary vulnerability analysis, identifying four structured, token-level implicit patterns that govern how LLMs explore code paths.
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.