Papers similar to 2606.11864

~ similar to 2606.11864· 20 results

cs.SEcs.AIRecentMay 28, 2026

Code-QA-Bench: Separating Code Reasoning from Documentation Memorization in Repository-Level QA

Jun Zhang, JianYing Qu, Hanwen Du, Zhongkai Sun +2 more

The paper introduces Code-QA-Bench, a novel framework that rigorously separates genuine code reasoning from mere documentation memorization in repository-level code understanding benchmarks.

View →

cs.CRcs.LGcs.SERecentMay 5, 2026

Identifier-Free Code Embedding Models for Scalable Search

Eric Wolos, Michael Doyle

The paper proposes and evaluates a novel embedding model for bidirectional function association between source code and decompiled/stripped code, significantly outperforming existing models.

View →

cs.SEcs.AIRecentMay 28, 2026

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models

Vedant Padwal

The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…

View →

cs.SEcs.AIcs.CLRecentJun 4, 2026

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Liliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie

Code2LoRA introduces a hypernetwork framework to efficiently inject repository-specific knowledge into code language models using LoRA adapters, supporting both static and evolving codebases.

View →

cs.SEcs.AIcs.HCRecentMay 28, 2026

How Coding Agents Fail Their Users: A Large-Scale Analysis of Developer-Agent Misalignment in 20,574 Real-World Sessions

Ningzhi Tang, Chaoran Chen, Gelei Xu, Yiyu Shi +4 more

This study analyzes over 20,000 real-world coding sessions to show that AI coding agents frequently fail users through subtle misalignment, requiring constant manual correction even when major system…

View →

cs.AIcs.IRcs.LGRecentMay 28, 2026

CoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool Retrieval

Vaishali Senthil, Ashutosh Hathidara, Sebastian Schreiber

CoHyDE introduces an iterative co-training framework that jointly optimizes an LLM rewriter and a dense encoder, significantly improving tool retrieval accuracy for LLM agents, especially on vague que…

View →

cs.SEcs.AIcs.IRRecentMay 27, 2026

Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets

Andrea Gurioli, Davide D'Ascenzo, Federico Pennino, Maurizio Gabbrielli +1 more

The paper introduces a hybrid system, HYBRIDSOURCETRACKER (HST), that combines vector search and Winnowing fingerprinting to achieve scalable, high-precision provenance tracking for code generated by…

View →

cs.AIRecentMay 28, 2026

RAISE: RAG Design as an Architecture Search Problem

Zhen Chen, Yibing Liu, Weihao Xie, Yu Liang +2 more

The paper proposes formulating RAG design as an architecture search problem and introduces RAISE, a comprehensive framework and benchmark for systematically optimizing RAG hyperparameters.

View →

cs.AIcs.CLRecentMay 28, 2026

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

Lorenz Kutschka, Bernhard Geiger

This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…

View →

cs.CLcs.AIcs.IRRecentMay 28, 2026

Entity-Collision: A Stratified Protocol for Attributing Retrieval Lift in Agent Memory

Youwang Deng

The paper introduces Entity-Collision, a rigorous protocol that separates genuine retrieval lift from simple lexical overlap, demonstrating that embedder performance depends critically on the query ty…

View →

cs.SEcs.AIcs.CLRecentMay 31, 2026

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

Yangzhen Wu, Aaron J. Li, Wenjie Ma, Li Cao +9 more

BenchEvolver introduces a solution-centric evolutionary framework to automatically transform saturated coding benchmarks into significantly harder, high-quality, and diverse evaluation suites.

View →

cs.CRcs.AIRecentJun 3, 2026

Search-Time Contamination in Deep Research Agents: Measuring Performance Inflation in Public Benchmark Evaluation

Yongjie Wang, Xinyue Zhang, Kunhong Yao, Zhiwei Zeng +3 more

The paper introduces the concept of Search-Time Contamination (STC), demonstrating that deep research agents can leak information from public benchmarks via web search, leading to an overestimation of…

View →

cs.SEcs.AIRecentMay 28, 2026

Inferring Code Correctness from Specification

Tambon Florian, Papadakis Mike

The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…

View →

cs.AIRecentMay 31, 2026

"Skill issues'': data-centric optimization of lakehouse agents

Nicole Rose Schneider, Davide Ghilardi, Giacomo Piccinini, Jacopo Tagliabue

The paper introduces a data-centric optimization pipeline to improve coding agents' ability to interact with a branching lakehouse, showing significant accuracy gains by treating agent evaluation as a…

View →

cs.SEcs.AIcs.CRRecentApr 14, 2026

CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

Qiang Zhang, Zhongnian Li

The paper proposes CoDe-R, a two-stage framework that significantly improves the accuracy and re-executability of decompiled code generated by LLMs, achieving a new SOTA in the lightweight regime.

View →

cs.CLRecentMay 31, 2026

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

Mengmeng Ji, Ravi Shanker Raju, Jonathan Lingjie Li, Chen Wu

LongAttnComp introduces a novel, two-stage fine-tuning framework for context compression that significantly improves long-context reasoning performance, matching or exceeding full-context accuracy on…

View →

cs.IRcs.AIRecentMay 29, 2026

DynaTree: Dynamic Agentic Retrieval Tree for Time-Sensitive News Retrieval

Siyuan Qi, Xinyuan Wang, Yingxuan Yang, Haochuan Guo +4 more

DynaTree introduces a two-stage framework that pre-constructs a reusable retrieval tree offline using coordinated agents, allowing for efficient, structure-aware, and highly effective time-sensitive n…

View →

cs.CLcs.AIRecentMay 27, 2026

VibeSearchBench: Benchmarking Long-horizon Proactive Search in the Wild

Xiaohongshu Inc

The paper introduces VibeSearchBench, a new benchmark designed to evaluate long-horizon, proactive search capabilities, demonstrating that current state-of-the-art LLM agents are still significantly i…

View →

cs.IRcs.AIRecentMay 30, 2026

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Md Zarif Ul Alam, Alireza Salemi, Hamed Zamani

Critic-R introduces a novel framework that uses a critic model to provide natural language introspective feedback, significantly improving the performance of agentic search systems by optimizing retri…

View →

cs.SEcs.CRRecentMay 10, 2026

Evaluating Tool Cloning in Agentic-AI Ecosystems

Taein Kim, David Jiang, Yuepeng Hu, Yuqi Jia +1 more

The paper presents a large-scale study demonstrating that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems, necessitating changes in how tool diversity is m…

View →