~ similar to 2606.00658· 17 results
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…
VideoMLA introduces a novel Multi-Head Latent Attention (MLA) mechanism that replaces per-head KV caches with a shared low-rank content latent, significantly reducing memory and improving throughput f…
The paper systematically characterizes column-level activation sparsity across various diffusion model architectures, demonstrating that element-level sparsity metrics significantly overestimate the a…
Leyi Qi, Yiming Li, Siyuan Liang, Zhengzhong Tu +1 more
The paper proposes Cert-LAS, a novel certified method for verifying model ownership in text-to-image diffusion models, which is robust against malicious signal removal attacks.
The paper introduces DiffusionHijack, a supply-chain backdoor attack that compromises the PRNG used by diffusion models to deterministically control generated images, which is successfully mitigated b…
Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more
The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…
The paper develops a quantitative framework to analyze and improve flow distillation in diffusion models, providing stability guarantees and suggesting non-uniform time scheduling to reduce approximat…
This paper investigates the phenomenon of 'copying' in Distribution Matching Distillation (DMD), finding that high-dimensional distillation causes student models to spontaneously reproduce the teacher…
Ruotong Liao, Guowen Huang, Qing Cheng, Guangyao Zhai +5 more
TunerDiT introduces a training-free progressive steering method to enhance multi-event video generation using Diffusion Transformers, achieving state-of-the-art performance by explicitly managing even…
Xinxin Liu, Shiwei Gan, Xiao Liu, Yafeng Yin +2 more
InfoMerge is a novel, training-free method that significantly compresses visual tokens for Video-LLMs by estimating temporal redundancy and allocating tokens based on content richness, achieving high…
Yuyang Zhao, Yicheng Pan, Qiyuan He, Jincheng Yu +5 more
SANA-Streaming introduces a novel, efficient framework that enables real-time, high-resolution streaming video-to-video editing by combining a hybrid diffusion transformer with specialized training an…
Gaussian Shannon proposes a novel watermarking framework that treats diffusion generation as a noisy communication channel, enabling both robust tracing and exact bit-level recovery of embedded waterm…
EdgeDetect is a communication-efficient and privacy-preserving federated intrusion detection system that uses gradient binarization and homomorphic encryption to significantly reduce bandwidth usage w…
The paper proposes DLLM-VSR, a novel Diffusion Large Language Model framework for Visual Speech Recognition, achieving state-of-the-art performance by treating transcription as iterative masked denois…
JinFeng Xie, Chengfu Ou, Peipeng Yu, Xiaoyu Zhou +4 more
Dual-Guard introduces a dual-channel latent watermarking framework that simultaneously embeds global provenance and localized content anchors into diffusion images, achieving robust detection against…
Rui Bao, Zheng Gao, Xiaoyu Li, Xiaoyan Feng +2 more
The paper introduces SHIFT, a training-free attack that exploits the vulnerability of diffusion-based watermarking by stochastically deflecting the generative trajectory, achieving high removal rates…
Jinnan Yang, Yan Wang, Zhen Bi, Kehao Wu +4 more
WaveFilter is a novel, training-free framework that uses wavelet transforms to efficiently filter critical tokens in the KV cache, significantly improving the long-context performance of Diffusion LLM…