~ similar to 2606.02366· 18 results
Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more
This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.
Panfei Cheng, Hongshan Yu, Wenrui Chen, Xiaojun Tang +2 more
The paper proposes a novel symmetry-aware, category-level method for 9D object pose estimation that accurately estimates translation and size first, followed by rotation, achieving state-of-the-art re…
T2Mo is a novel framework that generates controllable dynamic 3D object shapes by combining explicit 3D trajectories for spatial guidance with natural language text semantics.
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim +2 more
SimuScene introduces a novel compositional 3D reconstruction pipeline that integrates physics simulation directly into the shape and layout estimation process to generate stable, simulation-ready 3D s…
The paper proposes a disentangled representation framework to significantly improve few-shot layout-to-image generation by separating semantic identity from local visual details, thereby mitigating re…
Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more
MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…
GeM-NR proposes a novel, training-free framework to achieve general multi-view image editing, enabling consistent edits that drastically change both the geometry and appearance of a nonrigid scene.
The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…
The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…
The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…
TROPHIES introduces a unified framework to jointly reconstruct dynamic humans, static scenes, and camera poses from multi-view videos, achieving globally consistent and physically plausible 4D reconst…
Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more
The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.
Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.
Aoduo Li, Jiancheng Li, Huan Ye, Hongjian Xu +4 more
VEDAL introduces a variational, error-driven asynchronous learning framework to efficiently prune 3D Gaussian Splatting, achieving high compression ratios with minimal loss in novel view synthesis qua…
The paper introduces an adaptive feature-optimized vision front end that intelligently selects and budgets visual features for 3D reconstruction, significantly improving reconstruction quality and com…
The paper introduces MetricScenes, a new large-scale, in-the-wild dataset, and demonstrates that fine-tuning existing geometry models on this dataset significantly mitigates the scale-collapse problem…
This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.