~ similar to 2605.30813· 18 results
The paper proposes a unified framework for designing efficient and expressive token mixing layers by separating the direct and recurrent influences of inputs, allowing for a principled trade-off betwe…
The paper introduces CFGzip, an offline token space compression technique that significantly reduces the computational overhead of constrained decoding, making complex grammar enforcement feasible at…
This study benchmarks token-optimized formats (TOON and TRON) against JSON in end-to-end agentic AI systems, finding that TRON significantly reduces token overhead with minimal performance degradation…
This paper formalizes token optimization as a multi-objective constrained transformation problem for LLM-based Oracle-to-PostgreSQL migration, demonstrating that adaptive routing offers the best balan…
Meifang Chen, Zhe Yang, Huang Nianchen, Yizhan Huang +3 more
This paper investigates how Byte-Pair Encoding (BPE) tokenization causes Code LLMs to disproportionately memorize certain types of secrets, a phenomenon termed 'gibberish bias'.
The paper proposes EPIC, an efficient and parallel decoding framework that significantly speeds up the process of constraining diffusion language model outputs using Context-Free Grammars (CFG).
Yijiong Yu, Huazheng Wang, Shuai Yuan, Ruilong Ren +1 more
The paper proposes Speculative Pipeline Decoding (SPD), a novel framework that uses pipeline parallelism to accelerate LLM inference by processing multiple tokens in parallel, achieving higher speedup…
SentGuard introduces a novel sentence-level streaming guardrail that operates in parallel with LLM generation, achieving high detection rates of unsafe content early in the response while maintaining…
TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.
Elevator is a novel, deterministic binary translator that statically translates entire x86-64 executables to AArch64 by considering all possible interpretations of every byte, eliminating the need for…
The paper introduces codebadger, a Model Context Protocol (MCP) server that integrates Joern's Code Property Graph (CPG) with LLMs, enabling large language models to perform large-scale, semantic prog…
The paper introduces a hybrid system, HYBRIDSOURCETRACKER (HST), that combines vector search and Winnowing fingerprinting to achieve scalable, high-precision provenance tracking for code generated by…
This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…
Han Dai, Soumyakant Priyadarshan, Abdullah Imran, Ruoyu Wang +1 more
SCRIBE is a novel framework that enables reliable source-level patching of binaries by performing 'binary-aware' recompilation, successfully resolving syntactic and semantic inaccuracies inherent in d…
SEMBridge is a tagless-final framework that allows a single executable object program to generate multiple program semantics, including weakest-precondition and bounded-checking interpretations, ensur…
Pei-Yu Tseng, Lan Zhang, ZihDwo Yeh, Xiaoyan Sun +2 more
The paper introduces IOCRegex-gen, an automated LLM-based system that converts Indicators of Compromise (IOCs) into syntactically and semantically correct regular expressions, achieving high accuracy…
FPMoE introduces a sparse Mixture-of-Experts (MoE) architecture to improve functional code generation across multiple functional programming languages, achieving state-of-the-art performance with fewe…
The paper introduces the first byte-native Large Language Model (LLM) capable of analyzing raw executable binary data, achieving high accuracy in tasks like malware and architecture classification.