~ similar to 2605.29626· 19 results
Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more
The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…
Zekai Li, Ji Liu, Yiqing Huang, Ziqiong Liu +2 more
The paper proposes a novel trace-aware decoding framework, combining Temporal-Spatial Parallel Decoding (TSPD) and Confidence Extrapolation (CE), to significantly accelerate the inference of diffusion…
Xiaohang Tang, Keyue Jiang, Che Liu, Qifang Zhao +3 more
The paper proposes Guided Denoiser Self-Distillation (GDSD), a novel method that bypasses the use of likelihood surrogates (like ELBO) in RL for diffusion language models, achieving state-of-the-art p…
The paper introduces SHADOWMASK, the first systematic backdoor attack targeting Masked Diffusion Language Models (MDLMs), demonstrating near-100% attack success while preserving clean model utility.
This paper analyzes the decoding process of masked diffusion models for graph-to-text generation, finding that structural fine-tuning disrupts natural entity-first generation and proposing a structura…
Yuchen Zhu, Jing Shi, Chongjian Ge, Hao Tan +8 more
FLARE is a systematic conversion framework that enables a single checkpoint to support both autoregressive (AR) and diffusion-style parallel decoding for hybrid-attention large language models, achiev…
The paper introduces 'infilling extraction' to accurately model training data memorization in Diffusion Language Models (DLMs), finding that bidirectional masking significantly increases the extractab…
Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more
The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilitie…
Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go +1 more
The paper introduces SARDI, a novel, training-free framework that uses low-confidence 'lookahead' tokens generated during the denoising process of discrete diffusion language models to dynamically gui…
The paper proposes a novel global sketch-based watermarking technique for diffusion language models that controls the entire sequence's statistics, offering an order-agnostic and context-independent a…
The paper proposes DLLM-VSR, a novel Diffusion Large Language Model framework for Visual Speech Recognition, achieving state-of-the-art performance by treating transcription as iterative masked denois…
Xiaoyou Wu, Cheng-Jhih Shih, Binfei Ji, Yong Liu +1 more
BlockBatch introduces a novel framework that efficiently accelerates diffusion language model (dLLM) inference by simultaneously executing multiple block-size branches for a single request, achieving…
Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu +2 more
The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to per…
Shengfang Zhai, Xiaoyang Ji, Yuling Shi, Haoran Gao +5 more
The paper introduces BadDLM, a unified framework that demonstrates a new class of backdoor vulnerabilities in Diffusion Language Models (DLMs) by exploiting their forward masking process across divers…
The paper proposes SafeDIG, a robust safety steering framework that adapts Diffusion Transformers for text-to-image generation by treating safety control as position-aware sparse feature transfer, ens…
The paper analyzes order-agnostic language models (OALMs), finding that their learned conditionals are not true factorizations and proposing a variance-based diagnostic to compare the quality of diffe…
This paper demonstrates that Sparse Autoencoders (SAEs) can effectively steer Large Language Models (LLMs) on the AxBench benchmark, achieving performance comparable to LoRA baselines when combined wi…
The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…
Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma +1 more
dMoE proposes a block-level Mixture-of-Experts (MoE) framework for Diffusion Large Language Models (dLLMs) that aggregates token-level expert distributions into a unified block-level distribution, sig…