~ similar to 2605.28557· 20 results
The paper presents Tahoe, a system that optimizes Text-to-SQL performance through dynamic data management and hint learning.
Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian +1 more
SpecDB is a novel system that uses LLMs to synthesize highly customized, purpose-built relational databases, achieving performance comparable to commercial systems while significantly reducing code si…
The paper introduces CodeGolf Bench, a novel multi-language benchmark using code golf to measure LLMs' ability to generate highly concise and efficient code, showing that reasoning models significantl…
This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…
The paper introduces REBench, a comprehensive, standardized benchmark dataset designed to enable fair and rigorous evaluation of Large Language Models (LLMs) on complex binary reverse engineering task…
This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…
Leo Luo, Haining Xie, Siqi Shen, Zhipeng Ma +7 more
SIRIUS-SQL introduces a robust multi-candidate text-to-SQL system that addresses weaknesses in candidate generation, error handling, and selection, achieving state-of-the-art performance on complex be…
This paper benchmarks LLMs for smart contract security analysis, concluding that while LLMs show potential, their reliability is limited by lexical bias and requires integration with traditional stati…
The paper evaluates two novel LLM-based systems, RADAGAS and RefleXQLi, for generating adversarial SQL injection payloads, finding that RADAGAS-GPT4o achieves a high bypass rate, particularly against…
The paper introduces an efficient, novel algorithm for incremental Byte Pair Encoding (BPE) tokenization that processes input text prefix by prefix, achieving significant speedups and enabling streami…
Huawei Zheng, Sen Yang, Zhaorui Yang, Yuhui Zhang +11 more
EviLink addresses the ambiguity of schema linking in Text-to-SQL by treating it as an uncertainty-aware inference over multiple plausible SQL paths, significantly improving recall and efficiency.
Yangzhen Wu, Aaron J. Li, Wenjie Ma, Li Cao +9 more
BenchEvolver introduces a solution-centric evolutionary framework to automatically transform saturated coding benchmarks into significantly harder, high-quality, and diverse evaluation suites.
SecureMCP proposes a novel, policy-enforced framework that integrates Role-Based Access Control (RBAC) with an MCP server to provide multi-layer, fine-grained defense against malicious LLM-generated S…
The paper introduces Sophrosyne, a system that moderates LLM agent exploration in relational data systems, significantly reducing over-exploration and boosting SQL generation accuracy by guiding the a…
The paper proposes a multi-layered security framework to detect and mitigate SQL injection attacks that occur when Large Language Models translate natural language prompts into database queries.
The paper introduces SCDBench, a comprehensive benchmark dataset and methodology that rigorously evaluates LLM-based smart contract decompilers, finding that while frontier models can produce compilab…
The paper introduces SCDBench, a comprehensive benchmark dataset and methodology that rigorously evaluates LLM-based smart contract decompilers, finding that while frontier LLMs can generate compilabl…
The paper introduces CFGzip, an offline token space compression technique that significantly reduces the computational overhead of constrained decoding, making complex grammar enforcement feasible at…
The paper introduces Sieve, a system that uses a large language model (LLM) to generate executable query code from natural language security questions, significantly improving the ability to perform c…
This study empirically evaluates the cryptographic security of LLM-generated Rust code, finding that while general analysis tools are insufficient, a custom crypto-specific analyzer successfully ident…