~ similar to 2604.27319v1· 20 results
Baicheng Chen, Yu Wang, Ziheng Zhou, Xiangru Liu +3 more
The paper introduces CREBench, a comprehensive benchmark for evaluating Large Language Models (LLMs) on cryptographic binary reverse engineering, finding that while LLMs show promise, human experts st…
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
The paper introduces an automated framework demonstrating that LLM system instructions are vulnerable to encoding attacks, where structured output requests can bypass safety refusals and leak sensitiv…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
The paper proposes an automated, standardized framework to empirically compare the security quality of code generated through human-only, LLM-only, and hybrid collaboration methods.
The paper introduces an automatic numeric-remapping attack to test the robustness of LLMs on arithmetic word problems, finding that LLMs remain sensitive to small numeric changes in datasets like GSM8…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
The paper introduces a systematic benchmark to test LLMs' ability to recover Indicators of Compromise (IoCs) from JavaScript code, finding that while LLMs handle simple obfuscation well, encryption-ba…
The paper introduces LLM4CodeRE, a domain-adaptive LLM framework that significantly improves bidirectional code reverse engineering by unifying assembly-to-source and source-to-assembly translation.
Han Dai, Soumyakant Priyadarshan, Abdullah Imran, Ruoyu Wang +1 more
SCRIBE is a novel framework that enables reliable source-level patching of binaries by performing 'binary-aware' recompilation, successfully resolving syntactic and semantic inaccuracies inherent in d…
Elevator is a novel, deterministic binary translator that statically translates entire x86-64 executables to AArch64 by considering all possible interpretations of every byte, eliminating the need for…
This paper systematically studies how soft errors propagate during Large Language Model (LLM) inference using a novel fault-injection framework, providing critical insights and mitigation strategies f…
The paper proposes CoDe-R, a two-stage framework that significantly improves the accuracy and re-executability of decompiled code generated by LLMs, achieving a new SOTA in the lightweight regime.
The paper introduces a hybrid system, HYBRIDSOURCETRACKER (HST), that combines vector search and Winnowing fingerprinting to achieve scalable, high-precision provenance tracking for code generated by…
Puzhuo Liu, Yuhan Huang, Jianlei Chi, Peng Di +1 more
The paper introduces DEBENCH, a novel framework that evaluates binary decompilers based on three orthogonal dimensions—readability, recompilability, and functionality—revealing that functional recover…
The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.
This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…
The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.
The paper identifies a universal, statistically predictable distribution (Mandelbrot) governing LLM outputs, enabling a highly efficient, model-agnostic scoring primitive for provenance and quality as…