~ similar to 2606.05162· 20 results
Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more
MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…
Jingyun Liang, Min Wei, Shikai Li, Yizeng Han +4 more
The paper proposes a novel render-free framework that conditions video diffusion models directly on compressed 3D human mesh tokens, enabling robust 3D-aware human motion control without relying on re…
Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more
This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.
The paper proposes a novel cross-axis feature fusion architecture and an auxiliary joint-difference prediction task to significantly improve text-based 3D human motion editing by better understanding…
CubePart is a generative framework that enables the creation of complex 3D meshes by explicitly controlling and generating individual, semantically defined parts based on open-vocabulary text prompts.
Xuanyi Liu, Deyi Ji, Liqun Liu, Lanyun Zhu +7 more
CamGeo is a novel framework that improves sparse camera-conditioned image-to-video generation by distilling rich 3D geometric priors into the diffusion backbone, resulting in geometrically consistent…
Yiheng Li, Zhuo Li, Ruibing Hou, Yingjie Chen +3 more
The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling tr…
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more
The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…
PhyGenHOI introduces a novel framework that generates physically accurate and visually faithful 4D Human-Object Interactions by coupling generative human motion with explicit physical object simulatio…
Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim +2 more
SimuScene introduces a novel compositional 3D reconstruction pipeline that integrates physics simulation directly into the shape and layout estimation process to generate stable, simulation-ready 3D s…
The paper proposes SWIM, a novel imitation learning method that can synthesize physically-based swimming motions from a single example, demonstrating superior data efficiency and generalization across…
Qiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye +36 more
Qwen-VLA introduces a unified embodied foundation model that extends vision-language understanding to continuous action generation, enabling robust, multi-task generalization across diverse robotic ta…
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…
Jiayi Wu, Haoming Cai, Cornelia Fermuller, Christopher Metzler +1 more
Real2SAM2Real introduces a framework that uses explicit 3D caches, derived from 3D lifting models, to provide robust geometric guidance to Video Diffusion Models, significantly improving spatiotempora…
PRIMA is a framework that significantly improves 3D quadruped mesh recovery by integrating biological knowledge and a test-time adaptation strategy, achieving state-of-the-art results on diverse and c…
The paper introduces using frozen, generalist value functions as differentiable surrogates to efficiently optimize and analyze new multi-embodiment robot designs without requiring repeated reinforceme…
Beichen Shao, Mengying Xie, Heng Su, Wanyi Zhang +4 more
GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commo…
The paper introduces S2MDF, a plug-and-play module that enforces a hard constraint to eliminate interpenetrations in multi-object Signed Distance Field (SDF) representations, significantly improving p…
OctoT2I introduces a self-evolving, agentic routing framework that efficiently selects and combines multiple Text-to-Image models, achieving high performance while significantly boosting inference spe…