Papers similar to 2605.28557

~ similar to 2605.28557· 20 results

cs.DBcs.AIEmpiricalRecentJun 10, 2026

TAHOE: Text-to-SQL with Automated Hint Optimization from Experience

The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.

View →

cs.DBcs.AIRecentMay 29, 2026

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian +1 more

SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code si…

View →

cs.SEcs.AIRecentMay 28, 2026

CodeGolf Bench: A Multi-Language Benchmark for Evaluating Concise Code Generation Capabilities of Large Language Models

Vedant Padwal

The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…

View →

cs.AIcs.CLRecentMay 28, 2026

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

Lorenz Kutschka, Bernhard Geiger

This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…

View →

cs.CRcs.LGcs.SERecentApr 30, 2026

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin

The paper introduces REBench, a comprehensive, standardized benchmark dataset designed to enable fair and rigorous evaluation of Large Language Models (LLMs) on complex binary reverse engineering task…

View →

cs.CLRecentMay 31, 2026

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

Sagar Bhetwal, Rajan Bastakoti, Nirajan Acharya, Gaurav Kumar Gupta

This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…

View →

cs.AIRecentMay 31, 2026

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

Leo Luo, Haining Xie, Siqi Shen, Zhipeng Ma +7 more

SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex be…

View →

cs.CRcs.AIRecentMay 11, 2026

Benchmarking LLM-Based Static Analysis for Secure Smart Contract Development: Reliability, Limitations, and Potential Hybrid Solutions

Stefan-Claudiu Susan, Andrei Arusoaie, Dorel Lucanu

This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…

View →

cs.CRcs.AIcs.ETRecentMay 11, 2026

Adversarial SQL Injection Generation with LLM-Based Architectures

Ali Karakoc, H. Birkan Yilmaz

The paper evaluates two novel LLM-based systems, RADAGAS and RefleXQLi, for generating adversarial SQL injection payloads, finding that RADAGAS-GPT4o achieves a high bypass rate, particularly against…

View →

cs.CLcs.DSRecentMay 29, 2026

Incremental BPE Tokenization

Shenghu Jiang, Ruihao Gong

The paper introduces an efficient, novel algorithm for incremental Byte Pair Encoding (BPE) tokenization that processes input text prefix by prefix, achieving significant speedups and enabling streami…

View →

cs.CLcs.AIRecentMay 28, 2026

EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL

Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more

EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.

View →

cs.SEcs.AIcs.CLRecentMay 31, 2026

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

Yangzhen Wu, Aaron J. Li, Wenjie Ma, Li Cao +9 more

BenchEvolver introduces a solution-centric evolutionary framework to automatically transform saturated coding benchmarks into significantly harder, high-quality, and diverse evaluation suites.

View →

cs.CRRecentMay 6, 2026

SecureMCP: A Policy-Enforced LLM Data Access Framework for AIoT Systems via Model Context Protocol

Wonbae Kim, Hee-Kyong Yoo

SecureMCP proposes a novel, policy-enforced framework that integrates Role-Based Access Control (RBAC) with an MCP server to provide multi-layer, fine-grained defense against malicious LLM-generated S…

View →

cs.DBcs.AIRecentMay 29, 2026

Sophrosyne: Agentic Exploration of Relational Data Systems Needs Moderation

Madhav Jivrajani, Ramnatthan Alagappan, Aishwarya Ganesan

The paper introduces Sophrosyne, a system that moderates LLM agent exploration in relational data systems, significantly reducing over-exploration and boosting SQL generation accuracy by guiding the a…

View →

cs.CRcs.AIRecentMay 11, 2026

When Prompts Become Payloads: A Framework for Mitigating SQL Injection Attacks in Large Language Model-Driven Applications

Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, Pejman Najafi +2 more

The paper proposes a multi-layered security framework to detect and mitigate SQL injection attacks that occur when Large Language Models translate natural language prompts into database queries.

View →

cs.SEcs.AIcs.CRRecentMay 27, 2026

SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers

Kaihua Qin, Dawn Song, Arthur Gervais

The paper introduces SCDBench, a comprehensive benchmark dataset and methodology that rigorously evaluates LLM-based smart contract decompilers, finding that while frontier models can produce compilab…

View →

cs.SEcs.AIcs.CRRecentMay 27, 2026

SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers

Kaihua Qin, Dawn Song, Arthur Gervais

View →

cs.AIRecentMay 28, 2026

Accelerating Constrained Decoding with Token Space Compression

Michael Sullivan, Alexander Koller

The paper introduces CFGzip, an offline token space compression technique that significantly reduces the computational overhead of constrained decoding, making complex grammar enforcement feasible at…

View →

cs.CRRecentMay 21, 2026

Parser-Free Querying of Security Logs

Evan Luo, Julien Piet, David Wagner

The paper introduces Sieve, a system that uses a large language model (LLM) to generate executable query code from natural language security questions, significantly improving the ability to perform c…

View →

cs.CRcs.SERecentApr 29, 2026

An Empirical Security Evaluation of LLM-Generated Cryptographic Rust Code

Mohamed Elsayed, Kenneth Fulton, Jeong Yang

This study empirically evaluates the cryptographic security of LLM-generated Rust code, finding that while general analysis tools are insufficient, a custom crypto-specific analyzer successfully ident…

View →