~ similar to 2605.11315v1· 20 results
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper introduces Neuroforger, a system that combines a new formal specification language with LLMs and type checking to reliably generate and validate concrete violation witnesses (counterexamples…
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.
This paper empirically evaluates the security of code generated by seven popular LLMs and finds that all evaluated models generate code containing critical or high-severity vulnerabilities.
Fariha Tanjim Shifat, Hariswar Baburaj, Ce Zhou, Jaydeb Sarker +1 more
The paper analyzes GitHub security advisories for LLM-integrated open-source systems, finding that while most vulnerabilities map to existing code-level weaknesses, the architectural risks like Supply…
Bushra Sabir, Shigang Liu, Seung Ick Jang, Sharif Abuadbba +5 more
The paper evaluates multi-LLM strategies for secure code generation, finding that hybrid pipelines combining ensembling, static analysis, and patching achieve the strongest security performance, outpe…
Yujie Ma, Jialin Rong, Chenxi Yang, Lili Quan +3 more
The paper addresses the gap in understanding real-world LLM-in-the-loop vulnerabilities by creating the LLMCVE dataset and demonstrating that these vulnerabilities are significantly harder to repair t…
Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…
The paper proposes a general, compiler-integrated framework for secure content composition that minimizes the syntactic difference between secure and insecure coding practices.
Chengyan Ma, Jieke Shi, Ruidong Han, Ye Liu +2 more
The paper introduces SymTEE, an LLM-assisted symbolic execution framework that detects missing input validation vulnerabilities in TEE applications without needing complex, real TEE setups.
Yuwei Liu, Xinyi Wan, Yanhao Wang, Minghua Wang +2 more
KVerus is a retrieval-augmented system that significantly improves the scalability and resilience of formal verification for Rust code by managing complex cross-module dependencies and adapting to cod…
Xiaoyue Lu, Xianglin Yang, Haijun Liu, Jiahao Liu +3 more
The paper introduces POLARIS, a novel framework that systematically generates comprehensive and verifiable safety tests for LLMs by formalizing natural language policies into First-Order Logic and exp…
VeriCWEty proposes an embedding-based framework to detect and classify common software vulnerabilities (CWEs) in Verilog RTL code at both module and line levels, achieving high detection accuracy.
The paper proposes a Semantic Gateway and a Zero-Trust security model to formally validate and secure autonomous AI agents operating in enterprise systems, achieving a 100% discovery rate of unauthori…
Wenjie Qu, Ming Xu, Peiran Wang, Shengfang Zhai +2 more
The paper proposes defining 'intent-to-execution integrity' as the necessary end-to-end correctness property for securing LLM agents, arguing that current defenses are insufficient due to untrusted co…
This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…