~ similar to 2605.29822· 20 results
The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.
The paper introduces FVSpec, a large-scale benchmark that translates thousands of real-world Python property-based tests into formal Lean 4 specifications to evaluate AI models for formal software ver…
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun +2 more
The paper introduces Code-QA-Bench, a novel framework that rigorously separates genuine code reasoning from mere documentation memorization in repository-level code understanding benchmarks.
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
The paper introduces a hybrid system, HYBRIDSOURCETRACKER (HST), that combines vector search and Winnowing fingerprinting to achieve scalable, high-precision provenance tracking for code generated by…
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
This study formally verified 3,500 AI-generated code artifacts and found that a majority (55.8%) contain exploitable security vulnerabilities, regardless of the LLM used.
The paper introduces the Mitigation-Aware Chain-of-Thought (MA-CoT) framework, which significantly enhances the security reliability of code generated by LLMs across multiple languages and models.
The paper demonstrates that using Reinforcement Learning from Verifiable Rewards (RLVR) significantly improves small language models' functional correctness in code generation, particularly when combi…
The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…
The paper proposes a trust schema and verification framework to ensure that agent skills, which augment LLMs, are rigorously verified before deployment, thereby making human-in-the-loop oversight scal…
This paper proposes using large language models (LLMs) to generate and compositionally verify software implementations directly from natural language specifications, showing promising preliminary resu…
This paper analyzes large-scale reasoning traces from LLM-based binary vulnerability analysis, identifying four structured, token-level implicit patterns that govern how LLMs explore code paths.
Hao Wang, Niels Mündler, Mark Vero, Jingxuan He +2 more
The paper introduces SecPI, a fine-tuning pipeline that teaches reasoning language models (RLMs) to autonomously internalize structured security reasoning, significantly improving secure code generati…
SEMBridge is a tagless-final framework that allows a single executable object program to generate multiple program semantics, including weakest-precondition and bounded-checking interpretations, ensur…
The paper introduces SPECA, an LLM-driven framework that audits distributed protocols by deriving and enforcing security properties from natural-language specifications, enabling cross-implementation…
This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…
The paper introduces LCC-LLM, a code-centric framework and dataset that significantly improves the reliability of malware attribution and static analysis by grounding LLM reasoning in comprehensive, m…