~ similar to 2606.01246· 20 results
Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more
EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.
The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.
BADGER is a unified, production-grade evaluation framework that integrates text-to-SQL assessment with agentic behavior evaluation, significantly outperforming existing benchmarks on industry queries.
Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian +1 more
SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code si…
The paper introduces Sophrosyne, a system that moderates LLM agent exploration in relational data systems, significantly reducing over-exploration and boosting SQL generation accuracy by guiding the a…
Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao +1 more
The paper proposes MERIT, a dual-level, multi-horizon memory retrieval framework that significantly improves the performance of interactive text-to-SQL agents by providing both global and local memory…
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…
This paper formalizes token optimization as a multi-objective constrained transformation problem for LLM-based Oracle-to-PostgreSQL migration, demonstrating that adaptive routing offers the best balan…
Haowen Wang, Yaxin Du, Jian Yang, Jiajun Wu +8 more
MIRA proposes a novel source-aware filtering framework that discovers and anchors evaluation rubrics during data selection, significantly improving code-oriented mid-training data quality while reduci…
FuzzPilot is a controller for AFL++ that validates candidate mutation recipes by running short micro-campaigns, demonstrating a mechanism to manage fuzzing plateaus, though initial results on a satura…
The paper evaluates two novel LLM-based systems, RADAGAS and RefleXQLi, for generating adversarial SQL injection payloads, finding that RADAGAS-GPT4o achieves a high bypass rate, particularly against…
Jinheon Baek, Soyeong Jeong, Sangwoo Park, Woongyeong Yeo +4 more
OmniRetrieval introduces a unified framework that handles natural language queries across diverse, heterogeneous knowledge sources (text, relational, graphs) by dispatching source-native queries witho…
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
The paper introduces Chunk-Level Guided Generation, a training-free method that uses an off-the-shelf large language model (LLM) as a process scorer to guide small model generation, achieving performa…
The paper introduces Semantic Triplet Restoration (STR), a novel protocol that converts complex table structures into atomic semantic triplets, improving table question answering by providing explicit…
The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.
SkillPager is a novel two-stage framework that efficiently selects minimal, execution-sufficient context from large procedural skill documents by leveraging typed semantic nodes, significantly reducin…
Junjie Chen, Yuxi Dong, Haitao Li, Weihang Su +4 more
The paper introduces LongJudgeBench, a new benchmark designed to evaluate the reliability of LLM judges specifically for complex, long-form output evaluation, revealing significant instability gaps in…