~ similar to 2605.15569v1· 20 results
The paper introduces SLYP, an agentic pipeline that significantly improves the discovery of race condition vulnerabilities in Windows COM binaries and autonomously generates verified proof-of-concept…
Pengyu Sun, Qishu Jin, Enhao Huang, Zifeng Kang +3 more
VIPER-MCP is a novel, end-to-end automated framework that detects and dynamically confirms the exploitability of taint-style vulnerabilities in Model Context Protocol (MCP) servers, achieving high-fid…
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou +4 more
AgentVisor is a novel defense framework that uses semantic virtualization, inspired by OS principles, to significantly reduce LLM agent vulnerability to prompt injection while maintaining high utility…
ALPS is an automated, vendor-agnostic framework that enforces least privilege in serverless functions by analyzing code and generating precise security policies, achieving high coverage and significan…
The paper proposes a two-stage post-training pipeline to create a small, local LLM agent (PrivEsc-LLM) capable of performing Linux privilege escalation, achieving high success rates while drastically…
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
Quan Zhang, Lianhang Fu, Lvsi Lian, Gwihwan Go +4 more
The paper introduces GrantBox, a new security sandbox that evaluates how well LLM agents handle real-world tool privileges, finding that agents remain highly vulnerable to sophisticated attacks.
Shenao Wang, Xinyi Hou, Zhao Liu, Yanjie Zhao +4 more
This paper introduces Agentic Workflow Injection (AWI), a new class of vulnerability in LLM-powered GitHub Actions, and presents TaintAWI, a novel taint-analysis tool that identifies hundreds of explo…
This paper analyzes the security of LLM-based autonomous agents by drawing parallels to operating system security, finding that while some vulnerabilities are inherent, many can be mitigated using est…
The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
This paper demonstrates that by applying systematic prompting and retrieval techniques, local open-weight LLMs can significantly enhance their capabilities to autonomously perform Linux privilege esca…
Aymen Lassoued, Nacef Mbarek, Bechir Dardouri, Bassem Ouni +2 more
The paper introduces VULNSCOUT-C, a compact, specialized transformer model that achieves state-of-the-art performance in C code vulnerability detection while maintaining low inference cost, making it…
Sina Abdollahi, Mohammad M Maheri, Javad Forough, Amir Al Sadi +4 more
AgenTEE is a system that enables the secure, confidential execution of complex LLM agent pipelines directly on edge devices by using isolated confidential virtual machines.
The paper introduces ExploitBench, a capability-graded benchmark that measures the progressive stages of exploitation, demonstrating that while current frontier models can easily trigger bugs, achievi…
Zheng Yan, Jingxiang Weng, Charles Chen, Dengyun Peng +8 more
The paper introduces a new benchmark and decomposition method, Sufficiency-Tightness Decomposition, demonstrating that current coding agents struggle to accurately infer least-privilege authorization,…
Hwiwon Lee, Jiawei Liu, Dongjun Kim, Ziqi Zhang +2 more
The paper introduces SEC-bench Pro, a rigorous benchmark for evaluating LLM-based bug hunting on complex software, finding that even advanced agents struggle with long-horizon security tasks.
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
The paper introduces ABLE, an LLM-based system that automatically generates YARA rules to bypass malware evasion checks in analysis sandboxes, achieving a 79% bypass success rate.
Ting Zhang, Yikun Li, Chengran Yang, Ratnadira Widyasari +14 more
TitanCA presents a novel, multi-agent LLM orchestration framework that significantly improves vulnerability discovery by reducing false positives and identifying numerous zero-day vulnerabilities.