~ similar to 2606.01323· 17 results
Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go +1 more
The paper introduces SARDI, a novel, training-free framework that uses low-confidence 'lookahead' tokens generated during the denoising process of discrete diffusion language models to dynamically gui…
Wenna Lai, Haoran Xie, Guandong Xu, Qing Li +1 more
The paper proposes FiVeD, a fine-grained verification framework that uses diagnostic reasoning supervision to significantly improve the reliability and performance of Aspect Sentiment Triplet Extracti…
The paper introduces DLM-SWAI, a training-free method that effectively steers diffusion language models (DLMs) toward desired textual styles or properties by biasing the token distribution at each den…
The paper introduces BiAxisAudit, a novel framework that evaluates LLM bias by analyzing bias scores across multiple prompt formats and within the internal inconsistency of model responses, revealing…
Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more
The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…
The paper proposes Alignment-Guided Score Matching (AGSM), a lightweight, reward-free post-training method that integrates contrastive alignment guidance directly into the score-matching objective of…
Yuchen Zhu, Jing Shi, Chongjian Ge, Hao Tan +8 more
FLARE is a systematic conversion framework that enables a single checkpoint to support both autoregressive (AR) and diffusion-style parallel decoding for hybrid-attention large language models, achiev…
The paper introduces Latent Terms, a method that shows dense retrieval models implicitly learn sparse, Zipfian vocabularies that can be used for classical BM25-style sparse scoring without requiring s…
Adly Templeton, Tom Conerly, Jonathan Marcus, Jack Lindsey +22 more
The paper demonstrates that sparse autoencoders can successfully extract a large set of interpretable, causally influential features from the production-scale Claude 3 Sonnet language model.
The paper proposes DLLM-VSR, a novel Diffusion Large Language Model framework for Visual Speech Recognition, achieving state-of-the-art performance by treating transcription as iterative masked denois…
Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more
The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilitie…
Zhisheng Zhang, Xiang Li, Yixuan Zhou, Jing Peng +2 more
LoSATok proposes a low-dimensional semantic-acoustic tokenizer that efficiently compresses high-dimensional audio features into a compact latent space, significantly improving the performance and effi…
This paper proposes a domain-specialized large language model, PoetryQwen, for precise translation and emotional understanding of classical poetry.
The paper introduces 'infilling extraction' to accurately model training data memorization in Diffusion Language Models (DLMs), finding that bidirectional masking significantly increases the extractab…
Bangguo Zhu, Peng Huo, Yuanbo Zhao, Zhicheng Du +2 more
The paper proposes TDPM, a time-aware diffusion model for generative recommendation, which significantly improves recommendation accuracy by explicitly modeling the non-stationary, time-evolving natur…
Xiaoyou Wu, Cheng-Jhih Shih, Binfei Ji, Yong Liu +1 more
BlockBatch introduces a novel framework that efficiently accelerates diffusion language model (dLLM) inference by simultaneously executing multiple block-size branches for a single request, achieving…
This study systematically evaluates a wide range of chunking methods for Retrieval-Augmented Generation (RAG) to assess their effectiveness and highlight the overlooked challenges associated with chun…