~ similar to 2604.23196v1· 20 results
The paper introduces LCC-LLM, a code-centric framework and dataset that significantly improves the reliability of malware attribution and static analysis by grounding LLM reasoning in comprehensive, m…
The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.
The paper introduces LLM4CodeRE, a domain-adaptive LLM framework that significantly improves bidirectional code reverse engineering by unifying assembly-to-source and source-to-assembly translation.
The paper introduces Trident, a novel malware detection system that combines static features, LLM-derived behavioral rules, and direct LLM analysis to achieve superior robustness against concept drift…
The paper proposes a zero-label malware family classification framework that uses a weighted hierarchical ensemble of large language models (LLMs) to classify malware without requiring labeled trainin…
Saastha Vasan, Yuzhou Nie, Kaie Chen, Yigitcan Kaya +5 more
MalwarePT introduces a novel binary-level foundation model, pretrained on Windows PE code-section bytes using a ModernBERT-style encoder, demonstrating superior transfer learning capabilities across v…
This paper empirically evaluates the use of Retrieval-Augmented Generation (RAG) for malware explanation and finds that RAG frequently degrades explanation quality by adding noise when structured secu…
Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora +3 more
The paper introduces CodeScan, a novel black-box framework that detects data poisoning in code generation LLMs by analyzing structural similarities across multiple generations to identify recurring, v…
The paper demonstrates that static malware classifiers often rely on superficial artifacts like packing and metadata rather than true malicious semantics, using the TRUSTEE interpretability tool to di…
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
This paper addresses the lack of research on adversarial malware generation for Linux ELF binaries by developing a new semantic-preserving generator that achieves a high evasion rate against modern de…
Xavier Cadet, Aditya Vikram Singh, Harsh Mamania, Edward Koh +5 more
The paper introduces a Retrieval-Augmented Generation (RAG) system that uses targeted query filtering and LLM semantic reasoning to accurately and cost-effectively analyze complex cybersecurity incide…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
This paper quantifies the polymorphic capacity of a commercial LLM, demonstrating that it can cheaply generate large populations of structurally diverse, yet behaviorally equivalent, offensive code pa…
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
Maosen Zhang, Jianshuo Dong, Boting Lu, Wenyue Li +3 more
The paper introduces LeakDojo, a framework that systematically evaluates RAG leakage risks, finding that stronger LLM instruction-following and query generation are major independent contributors to d…
The paper introduces ABLE, an LLM-based system that automatically generates YARA rules to bypass malware evasion checks in analysis sandboxes, achieving a 79% bypass success rate.
The paper systematically evaluates various defense mechanisms against persistent memory attacks on LLM agents, finding that only tool-gating at the memory layer (Memory Sandbox) effectively mitigates…
Sangjun An, Hyeyeon Park, Yejin Son, Seoksu Lee +1 more
The paper proposes a novel framework to analyze large, obfuscated binaries by decomposing them into structurally coherent units, enabling large-scale dataset generation for LLM-based analysis.
The paper proposes a framework to intentionally evade malware detectors by adding a small number of benign API imports, successfully demonstrating targeted misclassification into a chosen benign categ…