~ similar to 2605.30038· 20 results
The paper proposes VRPO, a reinforcement learning-based optimization strategy that replaces static alignment losses in diffusion models, significantly improving both convergence and image fidelity.
Equilibrated Diffusion introduces a frequency-aware approach to image customization, disentangling style and subject content embeddings to achieve superior subject fidelity and text adherence.
The paper introduces Drifting Preference Optimization (DrPO), an efficient online method for preference finetuning one-step text-to-image generators that avoids complex gradient calculations and model…
Aniket Anand, Janvijay Singh, Zhewei Sun, Dilek Hakkani-Tür +1 more
The paper demonstrates that the AI-like style introduced by post-training alignment can be measured, localized, and causally removed using a novel ablation technique called PASTA.
Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more
The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…
Kai Wang, Jiale Zhang, Chengcheng Zhu, Chuang Ma +1 more
The paper proposes Hydra, a framework to stabilize and control the injection of multiple, conflicting backdoor triggers into text-to-image diffusion models, ensuring high attack reliability while main…
The paper proposes FedSAP, a framework that stabilizes federated prototype learning by delaying global alignment and enforcing inter-class structure, significantly improving representation quality und…
Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu +2 more
The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to per…
The paper proposes AHV-D&S, a novel training-free inference-time safeguard that detects and suppresses risky content in Diffusion Transformers (DiTs) by quantifying token sensitivity across attention…
Salim I. Amoukou, Emanuele Albini, Tom Bewley, Saumitra Mishra +1 more
The paper introduces Entropic Projection Alignment (EPA), a unified framework that estimates, explains, and improves model performance under distribution shift by aligning source and target distributi…
The paper introduces DLM-SWAI, a training-free method that effectively steers diffusion language models (DLMs) toward desired textual styles or properties by biasing the token distribution at each den…
DASH introduces a dual-branch distillation framework to effectively compress class-conditional diffusion models by independently supervising both score branches, significantly preserving guidance fide…
Paul Jünger, Justin Lovelace, Linxi Zhao, Dongyoung Go +1 more
The paper introduces SARDI, a novel, training-free framework that uses low-confidence 'lookahead' tokens generated during the denoising process of discrete diffusion language models to dynamically gui…
The paper introduces NaRA, a noise-aware LoRA technique that dynamically adapts fine-tuning parameters based on the noise level during diffusion, significantly improving the performance of Diffusion L…
The paper proposes a novel global sketch-based watermarking technique for diffusion language models that controls the entire sequence's statistics, offering an order-agnostic and context-independent a…
Qi Sun, Siyue Zhang, Yulin Chen, Yuxiang Xue +2 more
The paper proposes Preference Delta Aggregation (PDA), a framework that aggregates multiple weak preference signals derived from smaller model pairs using LoRA merging to significantly boost the perfo…
The paper demonstrates that clinical vision-language models (VLMs) pose a significant privacy risk by allowing de-identified images to be re-linked to original reports, and proposes a targeted differe…
The paper proposes a sequence-alignment framework using Soft Dynamic Time Warping to evaluate audio-driven talking-head generation, demonstrating that this approach provides more robust and fair compa…
The paper proposes SafeDIG, a robust safety steering framework that adapts Diffusion Transformers for text-to-image generation by treating safety control as position-aware sparse feature transfer, ens…
The paper proposes a unified, constrained optimization framework using KL divergence and likelihood constraints to achieve effective and principled unlearning in diffusion models.