~ similar to 2605.30748· 11 results
Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more
The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilitie…
Murmur is an efficient inference system for long-form ASR that resolves the accuracy-latency trade-off by optimizing both inter-chunk processing and intra-chunk attention mechanisms.
Yuchen Zhu, Jing Shi, Chongjian Ge, Hao Tan +8 more
FLARE is a systematic conversion framework that enables a single checkpoint to support both autoregressive (AR) and diffusion-style parallel decoding for hybrid-attention large language models, achiev…
Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more
The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…
Xiaoyou Wu, Cheng-Jhih Shih, Binfei Ji, Yong Liu +1 more
BlockBatch introduces a novel framework that efficiently accelerates diffusion language model (dLLM) inference by simultaneously executing multiple block-size branches for a single request, achieving…
Zekai Li, Ji Liu, Yiqing Huang, Ziqiong Liu +2 more
The paper proposes a novel trace-aware decoding framework, combining Temporal-Spatial Parallel Decoding (TSPD) and Confidence Extrapolation (CE), to significantly accelerate the inference of diffusion…
Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu +3 more
PolySpeech-100 introduces a massive, multi-lingual benchmark covering 110 linguistic variants to rigorously test Speech-LLMs, demonstrating that open-source models struggle with low-resource languages…
The paper proposes DLLM-VSR, a novel Diffusion Large Language Model framework for Visual Speech Recognition, achieving state-of-the-art performance by treating transcription as iterative masked denois…
The paper introduces DLM-SWAI, a training-free method that effectively steers diffusion language models (DLMs) toward desired textual styles or properties by biasing the token distribution at each den…
Yanjie An, Yuxiang Zhao, Yichi Zhang, Qixi Zheng +4 more
The paper introduces OpenSTBench, a unified, multidimensional evaluation framework designed to comprehensively compare heterogeneous speech translation systems by jointly assessing translation, speech…
TAPS introduces a target-aware prefix selection method that optimizes the trade-off between draft tree acceptance and verification cost, achieving significant speedups in speculative decoding.