20 results for “Decoding-time key-value compression”
CS papers onlyHybrid search: Keyword + semantic, ranked by combined score.ⓘ
Want pure semantic search? Try claim verification →
Wenhao Liu, Hao Shi, Yunhe Li, Weizhi Fei +6 more
This paper proposes a training-free framework called ReasonAlloc to mitigate inference bottlenecks in large language models by recasting decoding-time key-value compression as a hierarchical budget al…
Moment-KV introduces a novel momentum-based technique to compress the Key-Value (KV) cache during the decoding phase of LLM generation, significantly improving fidelity in long-generation tasks.
The paper introduces CFGzip, an offline token space compression technique that significantly reduces the computational overhead of constrained decoding, making complex grammar enforcement feasible at…
The paper introduces SB-ECC, a novel score-based decoder that models error correction as continuous-time denoising, achieving state-of-the-art performance across various code families and noise levels…
Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu +4 more
WaveFilter is a novel, training-free framework that uses wavelet transforms to efficiently filter critical tokens in the KV cache, significantly improving the long-context performance of Diffusion LLM…
The paper constructs high-rate public-key pseudorandom codes (PRCs) robust against edit errors, providing the first such binary constructions under assumptions that yield Hamming-robust PRCs.
The paper establishes information-theoretic lower bounds for stochastic optimization using low-bit gradients by reducing the problem to compressed Gaussian mean estimation, yielding sharp bounds on co…
This paper investigates the redundancy of the prompt KV cache during language model decoding, finding that the structure provided by chat templates is the primary source of redundancy, not the actual…
Junjie Peng, You Wu, Haoyi Wu, Jialong Han +3 more
GRKV introduces a training-free KV-cache merging method that uses global regression to distribute information from evicted tokens, solving the over-merging problem inherent in span-based retention.
The paper introduces the base-m length codec, a canonical and robust encoding scheme that maps byte strings to lists of residues modulo m, essential for finite-ring cryptosystems.
TimeMark proposes a trustworthy time watermarking framework that uses cryptographic techniques and error-correcting codes to achieve 100% accurate recovery of the generation time from AIGC, resisting…
This paper proposes a novel Simultaneous Data Compression and Encryption (SDCE) system that combines chaotic map-based encryption with Huffman encoding to securely and efficiently transmit large video…
Zhi Zhou, Ming Yang, Shi-Yu Tian, Kun-Yang Yu +2 more
The paper establishes the first theoretical framework for analyzing the learnability of Test-Time Adaptation (TTA) under non-stationary data streams by introducing Recovery Complexity, which quantifie…
TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.
This paper systematically analyzes combining dimensionality reduction and quantization to compress text embeddings, showing that this combined approach achieves substantial compression (e.g., 0.1% siz…
This paper proposes Stringology-Based Cryptology (SBC), a novel approach that analyzes the structural properties of cryptographic outputs by treating them as symbolic sequences, offering complementary…
Clark Hash is a stateless, deterministic quantization method that significantly reduces the storage size of neural embeddings while maintaining high accuracy for cosine similarity search.
Yuhang Han, Wenzheng Yang, Yujie Chen, Xiangqi Jin +3 more
STaR-KV introduces a novel, training-free KV cache compression framework that adaptively re-weights token importance across spatial, temporal, and distributional axes, significantly reducing GPU memor…
The paper introduces Morlet Positional Encoding (MoPE), a novel wavelet-based positional encoding that models position and locality simultaneously, outperforming standard sinusoidal and RoPE methods.
The paper establishes that the existence of many-time secure uncloneable encryption (UCE) can be shown to follow from relatively weak assumptions, such as the existence of many-time secure symmetric k…