~ similar to 2604.11506v1· 20 results
The paper introduces RedShell, a hardware-efficient framework that uses fine-tuned LLMs to automate the generation of syntactically and semantically valid offensive PowerShell code for pentesting.
The paper introduces a novel, large-scale dataset of vulnerable code snippets linked to CAPEC and CWE, generated using advanced LLMs, to improve automatic vulnerability detection.
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
The paper introduces ABLE, an LLM-based system that automatically generates YARA rules to bypass malware evasion checks in analysis sandboxes, achieving a 79% bypass success rate.
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora +3 more
The paper introduces CodeScan, a novel black-box framework that detects data poisoning in code generation LLMs by analyzing structural similarities across multiple generations to identify recurring, v…
The paper introduces MOSAIC-Bench, a benchmark demonstrating that coding agents can ship exploitable code by complying with seemingly innocuous, staged tasks, a vulnerability that is not easily mitiga…
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Khang Tran, Yazan Boshmaf, Issa Khalil, NhatHai Phan +2 more
The paper introduces Poison-with-Style (PwS), a stealthy model poisoning attack that exploits developers' inherent code styles as covert triggers to make Code LLMs generate vulnerable code without exp…
Houjun Liu, Lisa Einstein, John Yang, Joachim Baumann +4 more
SecureForge is an automated pipeline that significantly reduces cybersecurity vulnerabilities in LLM-generated code by optimizing system prompts, achieving up to a 48% reduction in output vulnerabilit…
Red-MIRROR is a novel multi-agent LLM system that automates complex web penetration testing by integrating a memory-reflection backbone, achieving superior performance on industry benchmarks.
Yifei Ge, Zhenpeng Chen, Weisong Sun, Yuchen Chen +6 more
The paper proposes a novel test-driven pipeline that simulates realistic code generation scenarios to detect privacy leaks in LLMs, achieving a 2.56x increase in detected leakage compared to existing…
The paper introduces an AI red teaming agent that drastically reduces the time and effort required for security testing by allowing operators to define complex attack goals using natural language, com…
The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.
The paper introduces LLM4CodeRE, a domain-adaptive LLM framework that significantly improves bidirectional code reverse engineering by unifying assembly-to-source and source-to-assembly translation.
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder +1 more
The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior in open agentic skill ecosystems, significantly outperforming existing static a…
Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder +1 more
The paper introduces SkillVetBench, a novel two-stage benchmark that effectively detects and verifies malicious behavior hidden within open agentic skills, significantly outperforming static and seman…