Papers similar to 2606.02491

~ similar to 2606.02491· 19 results

cs.CVRecentJun 3, 2026

Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text

Jaeyeong Kim, Ines Kim, Jahyeok Koo, Seungryong Kim

T2Mo is a novel framework that generates controllable dynamic 3D object shapes by combining explicit 3D trajectories for spatial guidance with natural language text semantics.

View →

cs.CVRecentJun 1, 2026

Retrieve What's Missing: Coverage-Maximizing Retrieval for Consistent Long Video Generation

Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee +1 more

The paper proposes COVRAG, a depth-based memory retrieval framework that maximizes the coverage of target-view regions to significantly improve long-term geometric consistency in autoregressive long v…

View →

cs.CERecentMay 29, 2026

CamGeo: Sparse Camera-Conditioned Image-to-Video Generation with 3D Geometry Priors

Xuanyi Liu, Deyi Ji, Liqun Liu, Lanyun Zhu +7 more

CamGeo is a novel framework that improves sparse camera-conditioned image-to-video generation by distilling rich 3D geometric priors into the diffusion backbone, resulting in geometrically consistent…

View →

cs.CVRecentJun 1, 2026

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Qixin Hu, Shuai Yang, Wei Huang, Song Han +1 more

LongLive-RAG proposes a novel Retrieval-Augmented Generation (RAG) framework to stabilize and improve the quality of long-horizon video generation by treating the entire generated history as a searcha…

View →

cs.CVRecentJun 1, 2026

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

Kiymet Akdemir, Pinar Yanardag

The paper introduces SPAWN, a training-free method that allows users to inject specified visual concepts into existing autoregressive world models, enabling controllable scene composition beyond the i…

View →

cs.CVcs.AIcs.LGRecentMay 29, 2026

RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video

Ulrich Prestel, Stefan Andreas Baumann, Nick Stracke, Björn Ommer

RayDer introduces a unified, feed-forward transformer that simplifies self-supervised novel view synthesis (NVS) by consolidating camera estimation, scene reconstruction, and rendering into a single,…

View →

cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →

cs.CVcs.AIRecentMay 29, 2026

Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion

Jiayi Wu, Haoming Cai, Cornelia Fermuller, Christopher Metzler +1 more

Real2SAM2Real introduces a framework that uses explicit 3D caches, derived from 3D lifting models, to provide robust geometric guidance to Video Diffusion Models, significantly improving spatiotempora…

View →

cs.CVcs.AIRecentMay 28, 2026

PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

Omer Benishu, Gal Fiebelman, Sagie Benaim

PhyGenHOI introduces a novel framework that generates physically accurate and visually faithful 4D Human-Object Interactions by coupling generative human motion with explicit physical object simulatio…

View →

cs.ROcs.AIcs.CVRecentMay 27, 2026

Turning Video Models into Generalist Robot Policies

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more

The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…

View →

cs.CVcs.RORecentJun 1, 2026

Not All Points Are Equal: Uncertainty-Aware 4D LiDAR Scene Synthesis

Xiang Xu, Alan Liang, Youquan Liu, Xian Sun +4 more

The paper introduces U4D, an uncertainty-aware framework that synthesizes 4D LiDAR scenes by prioritizing the reconstruction of geometrically difficult and uncertain regions first, leading to state-of…

View →

cs.CVRecentJun 1, 2026

VEDAL: Variational Error-Driven Asynchronous Learning for 3D Gaussian Splatting Pruning

Aoduo Li, Jiancheng Li, Huan Ye, Hongjian Xu +4 more

VEDAL introduces a variational, error-driven asynchronous learning framework to efficiently prune 3D Gaussian Splatting, achieving high compression ratios with minimal loss in novel view synthesis qua…

View →

cs.CVcs.AIcs.LGRecentMay 29, 2026

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…

View →

cs.CVRecentJun 1, 2026

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation

Yuheng Chen, Teng Hu, Yuji Wang, Qingdong He +2 more

The paper proposes ST-DRC, a Spatial-Temporal Decoupled Reference Conditioning framework that effectively balances high-level semantic control and low-level identity fidelity for text-to-video generat…

View →

cs.CVRecentJun 1, 2026

Geometry-Aware Implicit Memory for Video World Models

Zhengxuan Wei, Xu Guo, Xinghui Li, Xunzhi Xiang +7 more

The paper proposes GIM-World, a geometry-aware implicit memory framework that significantly improves long-horizon video world models by explicitly encoding 3D scene geometry into a compact memory stat…

View →

cs.CVRecentJun 1, 2026

PRIMA: Boosting Animal Mesh Recovery with Biological Priors and Test-Time Adaptation

Xiaohang Yu, Ti Wang, Mackenzie Weygandt Mathis

PRIMA is a framework that significantly improves 3D quadruped mesh recovery by integrating biological knowledge and a test-time adaptation strategy, achieving state-of-the-art results on diverse and c…

View →

cs.CVRecentJun 1, 2026

VISReg: Variance-Invariance-Sketching Regularization for JEPA training

Haiyu Wu, Randall Balestriero, Morgan Levine

VISReg introduces a novel regularization technique that combines variance control with a Sliced-Wasserstein-based sketching objective to stabilize self-supervised learning, achieving state-of-the-art…

View →

cs.CVRecentJun 1, 2026

Symmetry-Aware 9D Pose Estimation with Sim(3)-Consistent Feature and Spherical Inception Convolution

Panfei Cheng, Hongshan Yu, Wenrui Chen, Xiaojun Tang +2 more

The paper proposes a novel symmetry-aware, category-level method for 9D object pose estimation that accurately estimates translation and size first, followed by rotation, achieving state-of-the-art re…

View →

cs.CVRecentJun 1, 2026

From Extrinsic to Intrinsic: Geodesic-Guided Representation Learning for 3D Geometric Data

Yuming Zhao, Junhui Hou, Qijian Zhang, Jia Qin +1 more

The paper introduces PRISM, a novel representation learning framework that learns isometric embeddings by explicitly modeling the intrinsic geodesic metric of 3D surfaces, achieving superior performan…

View →