~ similar to 2606.03985· 15 results
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more
This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.
Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more
The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…
Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu +3 more
TempoVLA is a novel Vision-Language-Action model that enables controllable execution speed for robot manipulation by explicitly conditioning the policy on the desired speed.
Yiheng Li, Zhuo Li, Ruibing Hou, Yingjie Chen +3 more
The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling tr…
The paper proposes a novel cross-axis feature fusion architecture and an auxiliary joint-difference prediction task to significantly improve text-based 3D human motion editing by better understanding…
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…
The paper proposes SWIM, a novel imitation learning method that can synthesize physically-based swimming motions from a single example, demonstrating superior data efficiency and generalization across…
Beichen Shao, Mengying Xie, Heng Su, Wanyi Zhang +4 more
GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commo…
The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…
The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs ofte…
The paper proposes BitTP, a lightweight bitlinear architecture that quantizes LLM-based trajectory predictors to 1.58-bit weights while keeping activations full-precision, enabling high-performance de…
Renhao Zhang, Haotian Fu, Mingxi Jia, George Konidaris +2 more
The Parameterized Diffusion Policy (PDP) framework transforms diffusion models from general stochastic generators into precise, steerable tools for learning and adapting complex robotic behaviors by e…
Qiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye +36 more
Qwen-VLA introduces a unified embodied foundation model that extends vision-language understanding to continuous action generation, enabling robust, multi-task generalization across diverse robotic ta…
Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou +4 more
This paper proposes a compact, explicit interface for humanoid robots that enables diverse manipulation skills and demonstrates its feasibility through natural-language-driven task roll-outs.