~ similar to 2605.27813· 19 results
The paper introduces Residualized Sparse Autoencoders (ReSAEs) to improve multi-layer interventions in transformers by training each layer on the residual activation, which better preserves cross-laye…
The paper theoretically analyzes the properties that optimal sparse autoencoder (SAE) dictionaries must satisfy, deriving constraints that explain observed SAE behaviors like hierarchical splitting an…
Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more
The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…
Zekai Li, Ji Liu, Yiqing Huang, Ziqiong Liu +2 more
The paper proposes a novel trace-aware decoding framework, combining Temporal-Spatial Parallel Decoding (TSPD) and Confidence Extrapolation (CE), to significantly accelerate the inference of diffusion…
Hongfei Du, Jiacheng Shi, Sidi Lu, Gang Zhou +1 more
The paper uses sparse autoencoders to identify specific latent features within LLM-based TTS models, enabling interpretable and fine-grained control over emotional expression by intervening in small s…
The paper systematically characterizes column-level activation sparsity across various diffusion model architectures, demonstrating that element-level sparsity metrics significantly overestimate the a…
The paper introduces a distributional framework using Wasserstein distance to unify the semantic comparison of sparse autoencoder features across different layers and to automatically compress large f…
Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye +3 more
This paper proposes a new framework called STRIDE for training data attribution in Large Language Models.
The paper proposes DLLM-VSR, a novel Diffusion Large Language Model framework for Visual Speech Recognition, achieving state-of-the-art performance by treating transcription as iterative masked denois…
Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo +8 more
The paper proposes SimSD, a plug-and-play speculative decoding algorithm that adapts diffusion language models (dLLMs) to achieve fast, token-level acceleration by restoring causal masking capabilitie…
The paper introduces 'infilling extraction' to accurately model training data memorization in Diffusion Language Models (DLMs), finding that bidirectional masking significantly increases the extractab…
Pengfei Jin, Yiqi Tian, Kailong Fan, Bingjie Qi +1 more
The paper introduces Robust Prior Update (RPU), a module that improves the faithfulness of diffusion-based inverse solvers by stabilizing the prior update step, thereby reducing measurement-conditione…
Bangguo Zhu, Peng Huo, Yuanbo Zhao, Zhicheng Du +2 more
The paper proposes TDPM, a time-aware diffusion model for generative recommendation, which significantly improves recommendation accuracy by explicitly modeling the non-stationary, time-evolving natur…
The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…
The paper demonstrates that integrating Sparse Autoencoders (SAEs) into transformer residual streams significantly enhances the robustness of Large Language Models against various jailbreak attacks by…
This paper demonstrates that Sparse Autoencoders (SAEs) can effectively steer Large Language Models (LLMs) on the AxBench benchmark, achieving performance comparable to LoRA baselines when combined wi…
The paper introduces TRACER, a novel regularization framework that uses Weighted Moving Average (WMA) distillation to robustly finetune multimodal models, mitigating catastrophic forgetting and improv…
The paper introduces a novel diffusion posterior sampling method that stabilizes and accelerates data-consistent sampling by replacing hand-tuned guidance weights with a per-noise-level, curvature-gui…
VideoMLA introduces a novel Multi-Head Latent Attention (MLA) mechanism that replaces per-head KV caches with a shared low-rank content latent, significantly reducing memory and improving throughput f…