Papers similar to 2605.30054

~ similar to 2605.30054· 20 results

cs.SEcs.AIRecentMay 28, 2026

Inferring Code Correctness from Specification

The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…

View →

cs.CRcs.LGRecentApr 17, 2026

Surgical Repair of Insecure Code Generation in LLMs

Gustavo Sandoval, Brendan Dolan-Gavitt, Siddharth Garg

This paper identifies the 'Format-Reliability Gap'—where LLMs know about code vulnerabilities but generate insecure code anyway—and proposes a localized, per-vulnerability steering vector fix that sig…

View →

cs.SEcs.AIcs.CRRecentMay 11, 2026

Natural Language based Specification and Verification

Zhaorui Li, Chengyu Song

This paper proposes using large language models (LLMs) to generate and compositionally verify software implementations directly from natural language specifications, showing promising preliminary resu…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

Mohammed Kharma, Ahmed Sabbah, Mohammad Alkhanafseh, Mohammad Hammoudeh +1 more

The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…

View →

cs.DCcs.AIRecentJun 1, 2026

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

Yafan Huang, Sheng Di, Guanpeng Li

This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…

View →

cs.CRRecentApr 16, 2026

Feedback-Driven Execution for LLM-Based Binary Analysis

XiangRui Zhang, Qiang Li, Haining Wang

The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…

View →

cs.CRcs.AIcs.LGRecentMay 22, 2026

Enhancing Reliability in LLM-Based Secure Code Generation

Mohammed F. Kharma, Mohammad Alkhanafseh, Ahmed Sabbah, David Mohaisen

The paper introduces the Mitigation-Aware Chain-of-Thought (MA-CoT) framework, which significantly enhances the security reliability of code generated by LLMs across multiple languages and models.

View →

cs.SEcs.AIcs.CRRecentApr 14, 2026

CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

Qiang Zhang, Zhongnian Li

The paper proposes CoDe-R, a two-stage framework that significantly improves the accuracy and re-executability of decompiled code generated by LLMs, achieving a new SOTA in the lightweight regime.

View →

cs.CRcs.SERecentMay 29, 2026

How to Compare the Security of Code Written by Humans to LLM-generated Code

Rebecca Balebako, Jasmine Egl

The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.

View →

cs.CRcs.AIRecentMay 11, 2026

Benchmarking LLM-Based Static Analysis for Secure Smart Contract Development: Reliability, Limitations, and Potential Hybrid Solutions

Stefan-Claudiu Susan, Andrei Arusoaie, Dorel Lucanu

This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…

View →

cs.CRcs.LGcs.SERecentApr 30, 2026

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin

The paper introduces REBench, a comprehensive, standardized benchmark dataset designed to enable fair and rigorous evaluation of Large Language Models (LLMs) on complex binary reverse engineering task…

View →

cs.CRcs.AIRecentApr 7, 2026

LLM4CodeRE: Generative AI for Code Decompilation Analysis and Reverse Engineering

Hamed Jelodar, Samita Bai, Tochukwu Emmanuel Nwankwo, Parisa Hamedi +3 more

The paper introduces LLM4CodeRE, a domain-adaptive LLM framework that significantly improves bidirectional code reverse engineering by unifying assembly-to-source and source-to-assembly translation.

View →

cs.SEcs.AIRecentMay 28, 2026

Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA

Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun +2 more

The paper introduces Code-QA-Bench, a novel framework that rigorously separates genuine code reasoning from mere documentation memorization in repository-level code understanding benchmarks.

View →

cs.CRcs.AIRecentApr 1, 2026

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…

View →

cs.SEcs.AIcs.CRRecentApr 12, 2026

Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis

Jugal Gajjar

The paper introduces an execution-grounded, cross-language framework that significantly improves the reliability of LLM-driven code vulnerability analysis by ensuring that all proposed fixes are confi…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

Dylan Bouchard, Mohit Singh Chauhan, Zeya Ahmad, Ho-Kyeong Ra

The paper introduces functional entropy, a code-specific uncertainty quantification method, which successfully predicts functional correctness in LLM-generated code by replacing natural language seman…

View →

cs.CLcs.AIRecentMay 31, 2026

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah +4 more

The paper introduces Hybrid Verified Decoding, a method that predicts the acceptance length of a cache draft to intelligently select between cache verification and model-based drafting, achieving sign…

View →

cs.PLcs.AIcs.CLRecentMay 27, 2026

FPMoE: A Sparse Mixture-of-Experts Approach to Functional Code Generation

Loc Pham, Lang Hong Nguyet Anh, Thanh Le-Cong

FPMoE introduces a sparse Mixture-of-Experts (MoE) architecture to improve functional code generation across multiple functional programming languages, achieving state-of-the-art performance with fewe…

View →

cs.LGcs.AIRecentMay 27, 2026

Learning the Error Patterns of Language Models

Jinwoo Kim, Taylor Berg-KirkPatrick, Loris D'Antoni

The paper introduces prefix filters and an algorithm (Palla) to systematically learn and apply specific error patterns in Large Language Models, significantly improving constrained generation tasks li…

View →

cs.SEcs.AIRecentMay 28, 2026

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models

Vedant Padwal

The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…

View →