Diffusion Models
Generative models, DDPM, score-based, and image synthesis
20 papers indexed
Improving Visual Representation Alignment Generation with GRPO
The paper proposes VRPO, a reinforcement learning-based optimization strategy that replaces static alignment losses in diffusion models, significantly improving both convergence and image fidelity.
Controllable Lung Nodule Synthesis via Histogram-Regularized Latent Diffusion Models
Arunkumar Kannan, Yanbo Zhang, Han Liu, Michael Baumgartner +4 more
The paper introduces a histogram-regularized latent diffusion model to synthesize highly realistic and subtype-specific pulmonary nodules in 3D CT volumes, addressing the limitations of existing metho…
Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization
Yiming Wang, Baiqi Wu, Qingming Li, Jiahao Chen +2 more
The paper proposes FLAME, a novel framework that detects AI-generated image forgeries by identifying intrinsic energy anomalies caused by the diffusion process, achieving state-of-the-art localization…
When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm
Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more
The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…
Dual-Guard: Dual-Channel Latent Watermarking for Provenance and Tamper Localization in Diffusion Images
JinFeng Xie, Chengfu Ou, Peipeng Yu, Xiaoyu Zhou +4 more
Dual-Guard introduces a dual-channel latent watermarking framework that simultaneously embeds global provenance and localized content anchors into diffusion images, achieving robust detection against…
Improving Combined Detection and Classification of TEM Defects via Mask-Conditioned Latent Diffusion Augmentation
Ni Li, Nuohao Liu, Ryan Jacobs, Ajay Annamareddy +4 more
The paper proposes using a mask-conditioned latent diffusion model to generate synthetic, labeled TEM images for data augmentation, achieving small but measurable performance improvements in defect de…
SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark
Rui Bao, Zheng Gao, Xiaoyu Li, Xiaoyan Feng +2 more
The paper introduces SHIFT, a training-free attack that exploits the vulnerability of diffusion-based watermarking by stochastically deflecting the generative trajectory, achieving high removal rates…
Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection
Xiaojing Chen, Jingqi Cheng, Xu Zhao, Wan Jiang +1 more
The paper introduces Score-Guided Classification (SGC), a novel framework that uses an unsupervised anomaly score as a 'Pathological Prior' to guide EEG-based depression detection, overcoming the limi…
Graph Reconstruction from Differentially Private GNN Explanations
This paper introduces an attack, PRIVX, demonstrating that even differentially private (DP) Graph Neural Network (GNN) explanations leak enough structural information to allow an adversary to accurate…
From Noise to Control: Parameterized Diffusion Policies
Renhao Zhang, Haotian Fu, Mingxi Jia, George Konidaris +2 more
The Parameterized Diffusion Policy (PDP) framework transforms diffusion models from general stochastic generators into precise, steerable tools for learning and adapting complex robotic behaviors by e…
DiffCrossGait: Trajectory-Level Alignment for 2D-3D Cross-Modal Gait Recognition via Latent Diffusion
DiffCrossGait proposes a novel trajectory-level alignment method using latent diffusion to overcome domain discrepancies in 2D-3D gait recognition, achieving state-of-the-art performance.
Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image
The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…
diffGHOST: Diffusion based Generative Hedged Oblivious Synthetic Trajectories
The paper introduces diffGHOST, a conditional diffusion model that generates synthetic, privacy-preserving mobility trajectories by explicitly mitigating sample memorization in the latent space.
Bridging the Sim-to-Real Gap in Semiconductor Visual Program Synthesis via Input Binarization
Yusuke Ohtsubo, Kota Dohi, Koichiro Yawata, Koki Takeshita +1 more
The paper proposes a visual program synthesis framework using a VLM to generate accurate training data for semiconductor inspection, mitigating the sim-to-real gap by applying input binarization to st…
Complexity-Balanced Diffusion Splitting
The paper introduces Complexity-Balanced Splitting (CBS), a framework that efficiently allocates model capacity across the diffusion timeline by focusing computational resources on the most complex ge…
Geometric Erasure by Contrastive Velocity Matching in Rectified Flows
The paper introduces GEM, an effective concept erasure framework for Rectified Flow Transformers, by unifying trajectory-based unlearning with classic teacher-guided flow matching.
LoRA-Key: User-Centric LoRA Watermarking for Text-to-Image Diffusion Models
Yaopeng Wang, Qingliang Wang, Zhibo Wang, Huiyu Xu +4 more
LoRA-Key introduces a user-centric watermarking framework that attaches a recoverable ownership key to LoRA modules via a standalone Watermark LoRA, providing lightweight, plug-and-play copyright prot…
SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits
The paper introduces SEED, a large-scale benchmark dataset for tracing sequential deepfake facial edits, and proposes FAITH, a frequency-aware Transformer model that effectively detects and orders the…
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
Bohan Li, Shi Lian, Hankun Wang, Yiwei Guo +5 more
HoliTok introduces a novel continuous holistic tokenization model that provides a unified, high-fidelity latent representation for simultaneously supporting both speech generation and speech understan…
Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization
Equilibrated Diffusion introduces a frequency-aware approach to image customization, disentangling style and subject content embeddings to achieve superior subject fidelity and text adherence.