~ similar to 2604.04288v2· 20 results
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Bonan Ruan, Yeqi Fu, Chuqi Zhang, Jiahao Liu +2 more
This paper introduces Heimdallr, a novel framework that characterizes and detects LLM-induced security risks by analyzing the full execution chain of LLM integrations within GitHub CI workflows.
Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…
This paper empirically evaluates the security of code generated by seven popular LLMs and finds that all evaluated models generate code containing critical or high-severity vulnerabilities.
VulGD is a dynamic, open-access graph database that aggregates cybersecurity data from multiple sources and uses LLM embeddings to improve vulnerability representation and risk assessment.
Zhihao Chen, Ying Zhang, Yi Liu, Gelei Deng +6 more
This study conducts a large-scale empirical analysis of third-party LLM agent skills, identifying that credential leakage is a pervasive, cross-modal issue primarily caused by debug logging and result…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper establishes a standardized security assessment framework and develops a multi-layered defensive system, demonstrating that systematic testing and external defenses are crucial for safe LLM d…
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
This paper introduces UPAttack, a novel threat model demonstrating that focusing on explicit usability requirements can cause LLMs to generate insecure code by neglecting implicit security constraints…
The paper introduces GuardPhish, a large-scale dataset and evaluation framework, demonstrating that even high-performing open-source LLMs can generate actionable phishing content despite accurate inte…
The paper proposes an attestation-aware promotion gate to mitigate supply-chain risks in LLM pipelines by cryptographically verifying and enforcing claims about training and release artifacts before d…
This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…
Zirui Chen, Qi Zhan, Jiayuan Zhou, Xing Hu +2 more
This paper conducts a large-scale empirical study demonstrating that Java library exploits can accurately identify affected versions, achieving high recall and precision, and proposes strategies for e…
This study conducts a large-scale longitudinal analysis of CodeQL, finding that while the tool is effective at detecting vulnerabilities, its detection capabilities are not guaranteed to be stable acr…
The paper empirically analyzes the susceptibility of seven widely used AI-assisted development tools (MCP clients) to prompt injection via tool-poisoning, revealing significant disparities in their se…
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
This paper replicates and extends a study on Java security API misuse in LLMs, finding that while newer models improve performance, the misuse risk persists and is significantly mitigated by external…