~ similar to 2605.05969v1· 20 results
Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker +1 more
The paper analyzes GitHub security advisories for LLM-integrated open-source systems, finding that while most vulnerabilities map to existing code-level weaknesses, the architectural risks like Supply…
Shenao Wang, Xinyi Hou, Zhao Liu, Yanjie Zhao +4 more
This paper introduces Agentic Workflow Injection (AWI), a new class of vulnerability in LLM-powered GitHub Actions, and presents TaintAWI, a novel taint-analysis tool that identifies hundreds of explo…
Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
VulGD is a dynamic, open-access graph database that aggregates cybersecurity data from multiple sources and uses LLM embeddings to improve vulnerability representation and risk assessment.
This paper introduces UPAttack, a novel threat model demonstrating that focusing on explicit usability requirements can cause LLMs to generate insecure code by neglecting implicit security constraints…
The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…
The paper introduces Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models LLM prompts as structured segments to intercept prompt injection attacks with high accuracy and…
Neil Fendley, Zhengyu Liu, Aonan Guan, Jiacheng Zhong +1 more
The paper introduces JAW, a novel framework that demonstrates how adversaries can hijack agentic workflows on automation platforms like GitHub Actions by manipulating inputs based on context-grounded…
Minfeng Qi, Tianqing Zhu, Zijie Xu, Congcong Zhu +2 more
The paper introduces CAESAR, a novel multi-agent framework that coordinates LLM agents across five specialized roles to improve success rates and stability in complex, multi-stage cyber intrusion task…
Kevin Lira, Baldoino Fonseca, Davy Baía, Márcio Ribeiro +1 more
This study assesses the effectiveness and cost of four modern LLMs in detecting vulnerabilities that span multiple functions (interprocedural dependencies), finding that Gemini 3 Flash offers strong c…
The paper introduces Heimdall, an automated pipeline that uses LLMs and formal verification to safely and automatically migrate legacy, potentially buggy eBPF programs written in C to memory-safe Rust…
The paper introduces SecureBreak, a manually annotated, safety-oriented dataset designed to help detect harmful outputs from large language models (LLMs) that bypass existing security alignments.
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
This paper empirically evaluates the security of code generated by seven popular LLMs and finds that all evaluated models generate code containing critical or high-severity vulnerabilities.
The paper introduces ASTRAL, a multimodal LLM-driven framework that reconstructs and analyzes fragmented cyber-physical system architectures to enable comprehensive and quantitative security risk asse…
The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…