~ similar to 2606.03820· 17 results
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…
The paper introduces Strong Stochastic Flow Maps (SSFMs), a novel framework that directly learns the strong solution map of additive-noise Stochastic Differential Equations (SDEs), enabling few-step s…
This paper investigates the phenomenon of 'copying' in Distribution Matching Distillation (DMD), finding that high-dimensional distillation causes student models to spontaneously reproduce the teacher…
This paper develops a perturbation theory for spherical Hellinger-Kantorovich (SHK) gradient flows, providing explicit, time-dependent bounds on divergence metrics to guarantee differential privacy fo…
The paper introduces Optimal Mixture Transport (OMT), a scalable framework that reformulates optimal transport by using mixtures of subpopulations, resulting in a unique, biconvex optimization problem…
The paper introduces Complexity-Balanced Splitting (CBS), a framework that efficiently allocates model capacity across the diffusion timeline by focusing computational resources on the most complex ge…
Yiru Yang, Junling Wang, Nishant Kumar Singh, Luohong Wu +1 more
The paper proposes a novel layer and point-wise projection mapping combined with LoRA injection to efficiently distill knowledge from a large teacher model to a small student model, significantly impr…
The paper proposes FOAM, an adaptive damping method that stabilizes the Shampoo optimization algorithm by dynamically controlling damping and eigendecomposition frequency, thereby reducing staleness-i…
Renhao Zhang, Haotian Fu, Mingxi Jia, George Konidaris +2 more
The Parameterized Diffusion Policy (PDP) framework transforms diffusion models from general stochastic generators into precise, steerable tools for learning and adapting complex robotic behaviors by e…
The paper introduces a Jacobian-based spectral audit to evaluate neural operators, demonstrating that standard prediction error metrics fail to capture crucial local dynamical structures and operator…
Jinyang Du, Shenghao Jin, Ziqian Xu, Ruihao Gong +4 more
The paper proposes a compression pipeline combining few-step distillation and low-bit quantization to significantly reduce the deployment cost and parameter footprint of large dual-expert video diffus…
The paper introduces TRACER, a novel regularization framework that uses Weighted Moving Average (WMA) distillation to robustly finetune multimodal models, mitigating catastrophic forgetting and improv…
Udbhav Bamba, Arnav Chavan, Aryamaan Thakur, Steve Teig +1 more
DOT-MoE introduces a novel framework that treats the decomposition of dense layers into Mixture of Experts (MoE) as a Differentiable Optimal Transport problem, achieving superior efficiency while pres…
Xiaohang Tang, Keyue Jiang, Che Liu, Qifang Zhao +3 more
The paper proposes Guided Denoiser Self-Distillation (GDSD), a novel method that bypasses the use of likelihood surrogates (like ELBO) in RL for diffusion language models, achieving state-of-the-art p…
Yue Li, Linying Xue, Kaiqing Lin, Hanyu Quan +4 more
The paper proposes AEGIS, a novel diffusion-guided method for injecting adversarial perturbations into the latent space to create generalizable and robust defenses against advanced facial deepfake man…
Yuduo Li, Xiaofeng Shi, Qian Kou, Longbin Yu +1 more
RAFT proposes a two-stage framework combining data refinement and adaptive distillation to improve domain-specific fine-tuning while mitigating the loss of general model capabilities.
Zibo Diao, Jingchu Gai, Xinyue Ai, Zhang Zhang +2 more
The paper introduces Lossless Anti-Distillation Sampling (LADS), a novel sampling scheme that makes harvested data correlated for malicious distillers while ensuring benign users receive statistically…