~ similar to 2606.12387· 20 results
Leo Luo, Haining Xie, Siqi Shen, Zhipeng Ma +7 more
SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex be…
The paper proposes a multi-layered security framework to detect and mitigate SQL injection attacks that occur when Large Language Models translate natural language prompts into database queries.
This paper formalizes token optimization as a multi-objective constrained transformation problem for LLM-based Oracle-to-PostgreSQL migration, demonstrating that adaptive routing offers the best balan…
Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more
EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.
BADGER is a unified, production-grade evaluation framework that integrates text-to-SQL assessment with agentic behavior evaluation, significantly outperforming existing benchmarks on industry queries.
The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…
Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more
Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…
The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…
The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…
Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more
The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
Prompt Codebooks (PCO) introduces a compositional framework that treats prompt optimization as discrete learning over reusable instruction units, significantly improving LLM performance while drastica…
The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.
The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…
The paper introduces LinTree, a method that explicitly structures the search history of LLM reasoning traces using parent pointers, significantly improving task performance and search efficiency compa…
Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian +1 more
SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code si…
Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more
The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…
The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.
The paper introduces Sophrosyne, a system that moderates LLM agent exploration in relational data systems, significantly reducing over-exploration and boosting SQL generation accuracy by guiding the a…
This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…