Papers similar to 2605.30753

~ similar to 2605.30753· 18 results

cs.CLcs.AIRecentMay 31, 2026

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more

The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…

View →

cs.CLcs.AIRecentJun 1, 2026

SimSD: Simple Speculative Decoding in Diffusion Language Models

Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more

The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilitie…

View →

cs.AIcs.CLRecentMay 27, 2026

The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

Dueun Kim, Albert No

The paper argues that using confidence-based decoding, which is optimized via training mask alignment, fundamentally misaligns Masked Diffusion Models (MDMs) from the logical flow needed for complex r…

View →

cs.LGcs.AIRecentMay 28, 2026

BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference

Xiaoyou Wu, Cheng-Jhih Shih, Binfei Ji, Yong Liu +1 more

BlockBatch introduces a novel framework that efficiently accelerates diffusion language model (dLLM) inference by simultaneously executing multiple block-size branches for a single request, achieving…

View →

cs.AIcs.CVeess.ASRecentMay 27, 2026

Diffusion Large Language Models for Visual Speech Recognition

Jeong Hun Yeo, Chae Won Kim, Hyeongseop Rha, Yong Man Ro

The paper proposes DLLM-VSR, a novel Diffusion Large Language Model framework for Visual Speech Recognition, achieving state-of-the-art performance by treating transcription as iterative masked denois…

View →

cs.CLcs.AIRecentMay 28, 2026

DLM-SWAI: Steering Diffusion Language Models Before They Unmask

Hyeseon An, Yo-Sub Han

The paper introduces DLM-SWAI, a training-free method that effectively steers diffusion language models (DLMs) toward desired textual styles or properties by biasing the token distribution at each den…

View →

cs.LGcs.AIRecentJun 1, 2026

FLARE: Diffusion for Hybrid Language Model

Yuchen Zhu, Jing Shi, Chongjian Ge, Hao Tan +8 more

FLARE is a systematic conversion framework that enables a single checkpoint to support both autoregressive (AR) and diffusion-style parallel decoding for hybrid-attention large language models, achiev…

View →

cs.CLRecentMay 31, 2026

Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models

Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu +2 more

The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to per…

View →

cs.CLcs.AIRecentMay 30, 2026

EPIC: Efficient and Parallel Inference under CFG Constraints for Diffusion Language Models

Hyundong Jin, Yo-Sub Han

The paper proposes EPIC, an efficient and parallel decoding framework that significantly speeds up the process of constraining diffusion language model outputs using Context-Free Grammars (CFG).

View →

cs.CLRecentMay 29, 2026

dMoE: dLLMs with Learnable Block Experts

Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma +1 more

dMoE proposes a block-level Mixture-of-Experts (MoE) framework for Diffusion Large Language Models (dLLMs) that aggregates token-level expert distributions into a unified block-level distribution, sig…

View →

cs.CLRecentMay 31, 2026

Decoding in Order-Agnostic Language Models: Chain-Rule Deviation and Uniform Spreading

Lin Yao

The paper analyzes order-agnostic language models (OALMs), finding that their learned conditionals are not true factorizations and proposing a variance-based diagnostic to compare the quality of diffe…

View →

cs.LGcs.AIRecentMay 28, 2026

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

Xiaohang Tang, Keyue Jiang, Che Liu, Qifang Zhao +3 more

The paper proposes Guided Denoiser Self-Distillation (GDSD), a novel method that bypasses the use of likelihood surrogates (like ELBO) in RL for diffusion language models, achieving state-of-the-art p…

View →

cs.AIRecentMay 28, 2026

NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMs

Shuaidi Wang, Zhan Zhuang, Ruping Huang, Yu Zhang

The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…

View →

cs.LGcs.CRRecentMay 19, 2026

Backdooring Masked Diffusion Language Models

Daniel Yiming Cao, Chengzhong Wang, Sheng-Yen Chou, Chengyu Huang +2 more

The paper introduces SHADOWMASK, the first systematic backdoor attack targeting Masked Diffusion Language Models (MDLMs), demonstrating near-100% attack success while preserving clean model utility.

View →

cs.CLcs.AIcs.LGRecentJun 4, 2026

Self-Augmenting Retrieval for Diffusion Language Models

Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go +1 more

The paper introduces SARDI, a novel, training-free framework that uses low-confidence 'lookahead' tokens generated during the denoising process of discrete diffusion language models to dynamically gui…

View →

cs.CLcs.AIRecentMay 29, 2026

What Gets Unmasked First? Trajectory Analysis of Diffusion Models for Graph-to-Text Generation

Qing Wang, Jacob Devasier, Chengkai Li

This paper analyzes the decoding process of masked diffusion models for graph-to-text generation, finding that structural fine-tuning disrupts natural entity-first generation and proposing a structura…

View →

cs.SDcs.AIeess.ASRecentMay 29, 2026

Chatterbox-Flash: Prior-Calibrated Block Diffusion for Streaming Zero-Shot TTS

Deokjin Seo, Gangin Park, Kihyun Nam

Chatterbox-Flash introduces a prior-calibrated block diffusion model for zero-shot TTS that achieves high-fidelity, streaming synthesis with significantly lower computational overhead than existing me…

View →

cs.CLcs.AIRecentMay 30, 2026

WaveFilter: Enhancing the Long-Context Capability of Diffusion LLMs via Wavelet-Guided KV Cache Filtering

Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu +4 more

WaveFilter is a novel, training-free framework that uses wavelet transforms to efficiently filter critical tokens in the KV cache, significantly improving the long-context performance of Diffusion LLM…

View →