Papers similar to 2605.31590

~ similar to 2605.31590· 15 results

cs.CVcs.AIRecentMay 28, 2026

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Yuyang Zhao, Yicheng Pan, Qiyuan He, Jincheng Yu +5 more

SANA-Streaming introduces a novel, efficient framework that enables real-time, high-resolution streaming video-to-video editing by combining a hybrid diffusion transformer with specialized training an…

View →

cs.CLcs.AIRecentMay 28, 2026

DLM-SWAI: Steering Diffusion Language Models Before They Unmask

Hyeseon An, Yo-Sub Han

The paper introduces DLM-SWAI, a training-free method that effectively steers diffusion language models (DLMs) toward desired textual styles or properties by biasing the token distribution at each den…

View →

cs.AIRecentMay 28, 2026

Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers

Zihao Xue, Yan Wang, Zhen Bi, Long Ma +6 more

The paper proposes SafeDIG, a robust safety steering framework that adapts Diffusion Transformers for text-to-image generation by treating safety control as position-aware sparse feature transfer, ens…

View →

cs.CVRecentJun 1, 2026

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Qixin Hu, Shuai Yang, Wei Huang, Song Han +1 more

LongLive-RAG proposes a novel Retrieval-Augmented Generation (RAG) framework to stabilize and improve the quality of long-horizon video generation by treating the entire generated history as a searcha…

View →

cs.CVcs.AIRecentMay 27, 2026

SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

Zhida Zhang, Jie Ma, Zhan Peng, Haoxue Wu +4 more

SmartDirector is a novel framework that significantly improves cinematic video generation by using multiple keyframes to provide precise control over narrative structure and temporal pacing.

View →

cs.CVcs.AIeess.IVRecentJun 1, 2026

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

Jingyun Liang, Min Wei, Shikai Li, Yizeng Han +4 more

The paper proposes a novel render-free framework that conditions video diffusion models directly on compressed 3D human mesh tokens, enabling robust 3D-aware human motion control without relying on re…

View →

cs.CVcs.AIRecentMay 28, 2026

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Hidir Yesiltepe, Jiazhen Hu, Tuna Han Salih Meral, Adil Kaan Akan +3 more

VideoMLA introduces a novel Multi-Head Latent Attention (MLA) mechanism that replaces per-head KV caches with a shared low-rank content latent, significantly reducing memory and improving throughput f…

View →

cs.LGcs.AIRecentJun 1, 2026

E4GEN: Event-level Explainable Extreme-Enhanced Time-series Generation

Lin Jiang, Dahai Yu, Ximiao Li, Guang Wang

E4GEN introduces an explainable diffusion framework that significantly improves time-series generation by specifically focusing on and controlling the fidelity of extreme events.

View →

cs.CLRecentMay 31, 2026

Revise, Don't Freeze: Sampler-Matched Training for Self-Correcting Masked Diffusion Language Models

Longxuan Yu, Shaorong Zhang, Yu Fu, Hui Liu +2 more

The paper introduces D3IM, a novel parameter-free sampler that enables direct revision of visible tokens in Masked Diffusion Language Models, and proposes SCOPE to mitigate the model's tendency to per…

View →

cs.CVcs.AIRecentJun 1, 2026

Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

Xiang Li, Dianbo Liu, Kenji Kawaguchi

The paper introduces Diversity-inducing Initialization (DivIn), a novel method that improves image diversity by re-weighting the initial noise selection based on the guidance potential, thereby mitiga…

View →

cs.CVcs.AIRecentMay 31, 2026

Knowledge-Intensive Video Generation

Chenxu Wang, Mingda Chen

The paper introduces Knowledge-Intensive Video Generation (KIVI) as a challenging benchmark for evaluating video models on factuality and practical usefulness, showing that current state-of-the-art sy…

View →

cs.CLcs.AIRecentMay 31, 2026

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong +4 more

The paper introduces DSL-LLaDA, a method that lightly adapts a pre-trained masked diffusion language model to perform continuous denoising in embedding space, significantly improving text generation q…

View →

cs.GRcs.AIcs.CVRecentMay 31, 2026

Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation

Zhicheng Zhang, Lei Wang, Yu Zhang, Yongsheng Gao

The paper proposes a sequence-alignment framework using Soft Dynamic Time Warping to evaluate audio-driven talking-head generation, demonstrating that this approach provides more robust and fair compa…

View →

cs.LGcs.AIRecentJun 1, 2026

FLARE: Diffusion for Hybrid Language Model

Yuchen Zhu, Jing Shi, Chongjian Ge, Hao Tan +8 more

FLARE is a systematic conversion framework that enables a single checkpoint to support both autoregressive (AR) and diffusion-style parallel decoding for hybrid-attention large language models, achiev…

View →

cs.CVcs.AIcs.LGRecentMay 30, 2026

Improving Visual Representation Alignment Generation with GRPO

Shentong Mo, Sukmin Yun

The paper proposes VRPO, a reinforcement learning-based optimization strategy that replaces static alignment losses in diffusion models, significantly improving both convergence and image fidelity.

View →