~ similar to 2606.00220· 20 results
Yuwei Liu, Xinyi Wan, Yanhao Wang, Minghua Wang +2 more
KVerus is a retrieval-augmented system that significantly improves the scalability and resilience of formal verification for Rust code by managing complex cross-module dependencies and adapting to cod…
The paper introduces Heimdall, an automated pipeline that uses LLMs and formal verification to safely and automatically migrate legacy, potentially buggy eBPF programs written in C to memory-safe Rust…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
AutoSOUP is a system that automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs, leveraging a hybrid LLM-based architecture to overcome manual workflow limitat…
OverrideFuzz is a novel semantic-aware grammar fuzzer designed to test script-language runtimes by specifically modeling and exploiting complex behaviors like method overriding and dynamic rebinding,…
The paper introduces FVSpec, a large-scale benchmark that translates thousands of real-world Python property-based tests into formal Lean 4 specifications to evaluate AI models for formal software ver…
The paper proposes a general, compiler-integrated framework for secure content composition that minimizes the syntactic difference between secure and insecure coding practices.
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.
This study formally verified 3,500 AI-generated code artifacts and found that a majority (55.8%) contain exploitable security vulnerabilities, regardless of the LLM used.
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
Yaoyu Zhao, Yichen Xu, Oliver Bračevac, Cao Nguyen Pham +2 more
The paper introduces LACUNA, a novel programming model that allows LLM agents to write code that shapes the runtime environment while maintaining strong type-checking safety guarantees.
Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun +2 more
The paper introduces Code-QA-Bench, a novel framework that rigorously separates genuine code reasoning from mere documentation memorization in repository-level code understanding benchmarks.
FunFuzz introduces a multi-island evolutionary fuzzing framework that uses LLMs to generate structured inputs, achieving superior compiler coverage and discovering more unique failures compared to exi…
Huihui Huang, Jieke Shi, Bo Wang, Zhou Yang +1 more
MemHint is a neuro-symbolic static analysis pipeline that significantly improves memory leak detection in C/C++ by combining LLM semantic understanding with Z3 symbolic reasoning, detecting more leaks…
The paper presents a novel technology that uses zero-knowledge proofs to formally verify a software system's correctness against a public specification without revealing the system's internal details.
This paper proposes using large language models (LLMs) to generate and compositionally verify software implementations directly from natural language specifications, showing promising preliminary resu…
Elevator is a novel, deterministic binary translator that statically translates entire x86-64 executables to AArch64 by considering all possible interpretations of every byte, eliminating the need for…
The paper introduces ProofLoop, a novel ReAct agent that uses a solver-in-the-loop approach to automatically generate and formally verify SystemVerilog Assertions (SVA) from natural language specifica…
Filament is a novel, compiler-agnostic static information-flow control (IFC) library for Rust that enables fine-grained, Denning-style tracking of both explicit and implicit data flows with minimal pr…