20 results for “code retrieval”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Fuwei Zhang, Yanzhao Zhang, Mingxin Li, Dingkun Long +4 more
This paper introduces CORE-Bench, a comprehensive benchmark for code retrieval in agentic coding.
AsmRAG is a novel framework that improves malware detection by treating it as an evidence-based retrieval task using a code-specialized LLM, achieving high accuracy while providing transparent forensi…
Parteek Jamwal, Minghao Shao, Boyuan Chen, Achyuta Muthuvelan +14 more
The paper introduces RAVEN, a Retrieval-Augmented Vulnerability Exploration Network, which uses LLM agents and RAG to automatically generate comprehensive, structured vulnerability analysis reports fo…
The paper introduces LCC-LLM, a code-centric framework and dataset that significantly improves the reliability of malware attribution and static analysis by grounding LLM reasoning in comprehensive, m…
This paper presents a cryptanalytic attack demonstrating that a specific code-based Private Information Retrieval (PIR) scheme can be broken, allowing the server to efficiently determine the requested…
The paper proposes and evaluates a novel embedding model for bidirectional function association between source code and decompiled/stripped code, significantly outperforming existing models.
Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun +2 more
The paper introduces Code-QA-Bench, a novel framework that rigorously separates genuine code reasoning from mere documentation memorization in repository-level code understanding benchmarks.
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
Code2LoRA introduces a hypernetwork framework to efficiently inject repository-specific knowledge into code language models using LoRA adapters, supporting both static and evolving codebases.
The paper introduces a hybrid system, HYBRIDSOURCETRACKER (HST), that combines vector search and Winnowing fingerprinting to achieve scalable, high-precision provenance tracking for code generated by…
The paper introduces LLM4CodeRE, a domain-adaptive LLM framework that significantly improves bidirectional code reverse engineering by unifying assembly-to-source and source-to-assembly translation.
CoHyDE introduces an iterative co-training framework that jointly optimizes an LLM rewriter and a dense encoder, significantly improving tool retrieval accuracy for LLM agents, especially on vague que…
This systematic mapping survey reviews label-efficient approaches for code vulnerability detection, synthesizing five paradigm families and providing a decision guide to navigate trade-offs.
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
The paper introduces Sieve, a system that uses a large language model (LLM) to generate executable query code from natural language security questions, significantly improving the ability to perform c…
This paper proposes using offline reinforcement learning (RL) as an efficient alternative to online RL for post-training code-generating LLMs, demonstrating its effectiveness, especially for smaller m…
Chengtao Gan, Zhiqiang Liu, Long Jin, Yushan Zhu +2 more
CRAFTQA introduces a novel adaptive, code-driven framework that significantly enhances complex structured data reasoning by dynamically generating custom code functions beyond predefined operations.
Shenao Yan, Shimaa Ahmed, Shan Jin, Sunpreet S. Arora +3 more
The paper introduces CodeScan, a novel black-box framework that detects data poisoning in code generation LLMs by analyzing structural similarities across multiple generations to identify recurring, v…
Han Dai, Soumyakant Priyadarshan, Abdullah Imran, Ruoyu Wang +1 more
SCRIBE is a novel framework that enables reliable source-level patching of binaries by performing 'binary-aware' recompilation, successfully resolving syntactic and semantic inaccuracies inherent in d…