ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.12387· 20 results

cs.AIRecentMay 31, 2026

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

Leo Luo, Haining Xie, Siqi Shen, Zhipeng Ma +7 more

SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex be…

View →
cs.CRcs.AIRecentMay 11, 2026

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, Pejman Najafi +2 more

The paper proposes a multi-layered security framework to detect and mitigate SQL injection attacks that occur when Large Language Models translate natural language prompts into database queries.

View →
cs.LOcs.AIRecentMay 27, 2026

Token Optimization Strategies for LLM-Based Oracle-to-PostgreSQL Migration

Oleg Grynets, Dmytro Babarytskyi, Vasyl Lyashkevych

This paper formalizes token optimization as a multi-objective constrained transformation problem for LLM-based Oracle-to-PostgreSQL migration, demonstrating that adaptive routing offers the best balan…

View →
cs.CLcs.AIRecentMay 28, 2026

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more

EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.

View →
cs.AIRecentJun 1, 2026

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma +1 more

BADGER is a unified, production-grade evaluation framework that integrates text-to-SQL assessment with agentic behavior evaluation, significantly outperforming existing benchmarks on industry queries.

View →
cs.CRcs.AIcs.LGRecentMay 22, 2026

An Empirical Evaluation of LLM-Generated Code Security Across Prompting Methods

Mohammed Kharma, Ahmed Sabbah, Mohammad Alkhanafseh, Mohammad Hammoudeh +1 more

The paper empirically evaluates the security quality of LLM-generated code across various prompting methods, finding that while prompting alters the structure of weaknesses, it is insufficient to reli…

View →
cs.CLRecentMay 29, 2026

Mellum2 Technical Report

Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok +5 more

Mellum 2 is an open-weight 12B Mixture-of-Experts (MoE) language model specialized for software engineering, achieving performance competitive with larger models while maintaining the efficiency of a…

View →
cs.CRRecentApr 16, 2026

Feedback-Driven Execution for LLM-Based Binary Analysis

XiangRui Zhang, Qiang Li, Haining Wang

The paper introduces FORGE, a feedback-driven execution system that improves LLM-based binary analysis by interleaving reasoning and tool interaction, achieving high-quality vulnerability discovery on…

View →
cs.SEcs.AIRecentMay 28, 2026

Inferring Code Correctness from Specification

Tambon Florian, Papadakis Mike

The paper introduces TRAILS~, a novel method that improves code correctness validation by grounding LLM reasoning in concrete (input, output) pairs derived from specifications, achieving state-of-the-…

View →
cs.AIRecentMay 29, 2026

Distilling LLM Feedback for Lean Theorem Proving

Gaetan Narozniak, Gérard Biau, Rémi Munos, Ahmad Rammal +1 more

The paper introduces Feedback Distillation, a novel training method that uses a language model's privileged feedback to provide token-level supervision, significantly improving complex reasoning tasks…

View →
cs.SEcs.AIRecentMay 28, 2026

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models

Vedant Padwal

The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…

View →
cs.AIRecentMay 27, 2026

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

Jyotirmoy Nath, Neeraj Kumar, Brejesh Lall

Prompt Codebooks (PCO) introduces a compositional framework that treats prompt optimization as discrete learning over reusable instruction units, significantly improving LLM performance while drastica…

View →
cs.AIcs.DBRecentMay 27, 2026

A Query Engine for the Agents

Kenny Daniel

The paper introduces Hyperparam, a set of lightweight JavaScript libraries designed to enable direct, model-aware querying of unstructured data (like agent traces) within client-side AI applications.

View →
cs.AIRecentMay 27, 2026

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant +2 more

The paper introduces Contrastive Reflection (CORE), a novel non-parametric method that rapidly improves language model reasoning by distilling contrasts between successful and unsuccessful problem att…

View →
cs.AIRecentMay 29, 2026

LinTree: Improving LLM Reasoning with Explicitly Structured Search Histories

Liwei Kang, Yee Whye Teh, Wee Sun Lee

The paper introduces LinTree, a method that explicitly structures the search history of LLM reasoning traces using parent pointers, significantly improving task performance and search efficiency compa…

View →
cs.DBcs.AIRecentMay 29, 2026

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian +1 more

SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code si…

View →
cs.AIRecentMay 28, 2026

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Tong Ye, Hang Yu, Tengfei Ma, Xuhong Zhang +5 more

The paper introduces DOMINO, a novel inductive framework that synthesizes domain-specific data for LLMs using only reference examples, significantly improving performance on challenging, implicitly de…

View →
cs.SEcs.AIRecentMay 28, 2026

Projectional Decoding: Towards Semantic-Aware LLM Generation

Boqi Chen, José Antonio Hernández López, Aren A. Babikian

The paper proposes projectional decoding, a novel framework that integrates a partial graph model alongside text generation to ensure the semantic validity of LLM-generated software artifacts.

View →
cs.DBcs.AIRecentMay 29, 2026

Sophrosyne: Agentic Exploration of Relational Data Systems Needs Moderation

Madhav Jivrajani, Ramnatthan Alagappan, Aishwarya Ganesan

The paper introduces Sophrosyne, a system that moderates LLM agent exploration in relational data systems, significantly reducing over-exploration and boosting SQL generation accuracy by guiding the a…

View →
cs.CLRecentMay 31, 2026

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

Sagar Bhetwal, Rajan Bastakoti, Nirajan Acharya, Gaurav Kumar Gupta

This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…

View →