Papers similar to 2606.01246

~ similar to 2606.01246· 20 results

cs.CLcs.AIRecentMay 28, 2026

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more

EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.

View →

cs.DBcs.AIEmpiricalRecentJun 10, 2026

TAHOE: Text-to-SQL with Automated Hint Optimization from Experience

Zhiyi Chen, Jie Song, Peng Li

The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.

View →

cs.AIRecentJun 1, 2026

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma +1 more

BADGER is a unified, production-grade evaluation framework that integrates text-to-SQL assessment with agentic behavior evaluation, significantly outperforming existing benchmarks on industry queries.

View →

cs.DBcs.AIRecentMay 29, 2026

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian +1 more

SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code si…

View →

cs.DBcs.AIRecentMay 29, 2026

Sophrosyne: Agentic Exploration of Relational Data Systems Needs Moderation

Madhav Jivrajani, Ramnatthan Alagappan, Aishwarya Ganesan

The paper introduces Sophrosyne, a system that moderates LLM agent exploration in relational data systems, significantly reducing over-exploration and boosting SQL generation accuracy by guiding the a…

View →

cs.CLRecentMay 30, 2026

Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents

Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more

The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…

View →

cs.SEcs.AIRecentMay 28, 2026

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models

Vedant Padwal

The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…

View →

cs.SEcs.AIRecentMay 28, 2026

Inferring Code Correctness from Specification

Tambon Florian, Papadakis Mike

The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…

View →

cs.CLRecentMay 31, 2026

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

Sagar Bhetwal, Rajan Bastakoti, Nirajan Acharya, Gaurav Kumar Gupta

This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…

View →

cs.LOcs.AIRecentMay 27, 2026

Token Optimization Strategies for LLM-Based Oracle-to-PostgreSQL Migration

Oleg Grynets, Dmytro Babarytskyi, Vasyl Lyashkevych

This paper formalizes token optimization as a multi-objective constrained transformation problem for LLM-based Oracle-to-PostgreSQL migration, demonstrating that adaptive routing offers the best balan…

View →

cs.AIRecentMay 28, 2026

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Haowen Wang, Yaxin Du, Jian Yang, Jiajun Wu +8 more

MIRA proposes a novel source-aware filtering framework that discovers and anchors evaluation rubrics during data selection, significantly improving code-oriented mid-training data quality while reduci…

View →

cs.SEcs.CRRecentMay 25, 2026

FuzzPilot: Plateau-Triggered Recipe Validation for Structured Text Fuzzing

Zhiyi Yao

FuzzPilot is a controller for AFL++ that validates candidate mutation recipes by running short micro-campaigns, demonstrating a mechanism to manage fuzzing plateaus, though initial results on a satura…

View →

cs.CRcs.AIcs.ETRecentMay 11, 2026

Adversarial SQL Injection Generation with LLM-Based Architectures

Ali Karakoc, H. Birkan Yilmaz

The paper evaluates two novel LLM-based systems, RADAGAS and RefleXQLi, for generating adversarial SQL injection payloads, finding that RADAGAS-GPT4o achieves a high bypass rate, particularly against…

View →

cs.CLcs.AIcs.IRRecentMay 28, 2026

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Jinheon Baek, Soyeong Jeong, Sangwoo Park, Woongyeong Yeo +4 more

OmniRetrieval introduces a unified framework that handles natural language queries across diverse, heterogeneous knowledge sources (text, relational, graphs) by dispatching source-native queries witho…

View →

cs.CRRecentApr 16, 2026

Feedback-Driven Execution for LLM-Based Binary Analysis

XiangRui Zhang, Qiang Li, Haining Wang

The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…

View →

cs.CLcs.AIcs.LGRecentJun 1, 2026

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

Atoosa Chegini, Soheil Feizi

The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…

View →

cs.CLRecentMay 29, 2026

Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models

Yibin Zhao, Fangxin Shang, Dingrui Yang, Yuqi Wang

The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…

View →

cs.AIcs.DBRecentMay 27, 2026

A Query Engine for the Agents

Kenny Daniel

The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.

View →

cs.IRcs.AIRecentMay 30, 2026

SkillPager: Query-Adaptive Intra-Skill Navigation via Semantic Node Retrieval

Zicai Cui, Zihan Guo, Weiwen Liu, Weinan Zhang

SkillPager is a novel two-stage framework that efficiently selects minimal, execution-sufficient context from large procedural skill documents by leveraging typed semantic nodes, significantly reducin…

View →

cs.CLRecentJun 1, 2026

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su +4 more

The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in…

View →