~ similar to 2605.31535· 17 results
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more
MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…
The paper proposes a fast and lightweight novel view synthesis method using a differentiable Multiplane Image (MPI) representation, achieving significant speed and size improvements over state-of-the-…
GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.
Kangrui Wang, Linjie Li, Zhengyuan Yang, Shiqi Chen +6 more
The paper addresses the challenge of multi-turn view planning for VLMs by proposing an iterative framework that uses self-exploration and view graph distillation, significantly improving planning perf…
Xuanyi Liu, Deyi Ji, Liqun Liu, Lanyun Zhu +7 more
CamGeo is a novel framework that improves sparse camera-conditioned image-to-video generation by distilling rich 3D geometric priors into the diffusion backbone, resulting in geometrically consistent…
Lu Liu, Huiyu Duan, Chenxin Zhu, Jintong Lu +5 more
The paper introduces LL-Bench, a comprehensive benchmark for evaluating large-scale generative models on low-level vision tasks, and proposes LL-Score, an MLLM-based evaluator that better aligns quali…
TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconst…
The paper introduces an adaptive feature-optimized vision front end that intelligently selects and budgets visual features for 3D reconstruction, significantly improving reconstruction quality and com…
The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…
Jiazheng Xing, Hangjie Yuan, Lingling Cai, Xinyu Liu +8 more
Lumos-Nexus is a training-efficient framework that enhances video generation quality by progressively bridging generation from a lightweight model to a high-fidelity generator in a shared latent space…
PixVOD proposes a fully parallelizable, pixel-distributed framework for visual odometry and depth estimation that performs computations directly on the sensor using Gaussian Belief Propagation.
Zezhong Qian, Zhao Yang, Lu Tan, Zhihao Yan +3 more
The paper introduces CityGen, a diffusion-based framework that enables zero-label city adaptation for autonomous driving by synthesizing city-style data conditioned on HD maps and visual prompts, sign…
Jiayi Wu, Haoming Cai, Cornelia Fermuller, Christopher Metzler +1 more
Real2SAM2Real introduces a framework that uses explicit 3D caches, derived from 3D lifting models, to provide robust geometric guidance to Video Diffusion Models, significantly improving spatiotempora…
PatchPoison introduces a lightweight dataset-poisoning method that injects small, high-frequency adversarial patches into multi-view image datasets to systematically corrupt feature matching and degra…
Minseok Joo, Dogyun Park, Taehoon Lee, Kyujin Lee +1 more
The paper proposes COVRAG, a depth-based memory retrieval framework that maximizes the coverage of target-view regions to significantly improve long-term geometric consistency in autoregressive long v…
The paper proposes BRACS, a training-free steering framework that adaptively corrects visual grounding failures in large vision-language models, significantly reducing object hallucination without sac…