Papers similar to 2605.31120

~ similar to 2605.31120· 19 results

cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →

cs.RORecentJun 3, 2026

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more

This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.

View →

cs.CVcs.AIRecentMay 28, 2026

PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

Omer Benishu, Gal Fiebelman, Sagie Benaim

PhyGenHOI introduces a novel framework that generates physically accurate and visually faithful 4D Human-Object Interactions by coupling generative human motion with explicit physical object simulatio…

View →

cs.ROcs.AIRecentMay 30, 2026

Shape Your Body: Value Gradients for Multi-Embodiment Robot Design

Nico Bohlinger, Jan Peters

The paper introduces using frozen, generalist value functions as differentiable surrogates to efficiently optimize and analyze new multi-embodiment robot designs without requiring repeated reinforceme…

View →

cs.ROcs.AIcs.LGRecentJun 4, 2026

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou +4 more

This paper proposes a compact, explicit interface for humanoid robots that enables diverse manipulation skills and demonstrates its feasibility through natural-language-driven task roll-outs.

View →

cs.CVcs.RORecentJun 2, 2026

SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image

Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim +2 more

SimuScene introduces a novel compositional 3D reconstruction pipeline that integrates physics simulation directly into the shape and layout estimation process to generate stable, simulation-ready 3D s…

View →

cs.CVRecentJun 3, 2026

Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text

Jaeyeong Kim, Ines Kim, Jahyeok Koo, Seungryong Kim

T2Mo is a novel framework that generates controllable dynamic 3D object shapes by combining explicit 3D trajectories for spatial guidance with natural language text semantics.

View →

cs.ROcs.AIcs.CLRecentMay 28, 2026

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Qiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye +36 more

Qwen-VLA introduces a unified embodied foundation model that extends vision-language understanding to continuous action generation, enabling robust, multi-task generalization across diverse robotic ta…

View →

cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →

cs.AIRecentMay 28, 2026

Physically Viable World Models: A Case for Query-Conditioned Embodied AI

Adam J. Thorpe, Stepan Tretiakov, Cheng-Hsi Hsiao, Su Ann Low +5 more

The paper argues that for embodied AI to be safe and effective, world models must be physically viable, requiring a structural shift from mere observation prediction to representing the underlying phy…

View →

cs.RORecentJun 3, 2026

HORIZON: Recoverability-Governed Curriculum for Physical-Domain Scaling

Chenhao Bai, Liqin Lu, Kaijun Wang, Hui Chen +4 more

This paper studies how to scale robust robot policies by expanding physical domains in a recoverable way.

View →

cs.CVRecentJun 1, 2026

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

Kiymet Akdemir, Pinar Yanardag

The paper introduces SPAWN, a training-free method that allows users to inject specified visual concepts into existing autoregressive world models, enabling controllable scene composition beyond the i…

View →

cs.ROcs.AIcs.CVRecentJun 2, 2026

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

Zekun Qi, Xuchuan Chen, Dairu Liu, Chenghuai Lin +9 more

The paper introduces Humanoid-GPT, a large-scale generative Transformer model that achieves robust zero-shot motion tracking and control by training on a massive, unified corpus of motion data.

View →

cs.ROcs.AIcs.CVEmpiricalRecentJun 11, 2026

Mana: Dexterous Manipulation of Articulated Tools

Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu

This paper presents Mana, a sim-to-real framework for dexterous articulated tool manipulation.

View →

cs.ROcs.AIRecentMay 27, 2026

Visualizing Latent Phase Structures in Locomotion Policies: A Multi-Environment Study with Temporal Feature Extension

Daisuke Yasui, Toshitaka Matuki, Hiroshi Sato

The paper proposes a novel framework to visualize and uncover latent, structured motion phases in deep reinforcement learning locomotion policies by augmenting state observations with action and next-…

View →

cs.ROcs.AIcs.CVRecentMay 27, 2026

Turning Video Models into Generalist Robot Policies

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more

The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…

View →

cs.CVcs.AIRecentMay 31, 2026

Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing

Gyojin Han, Junmo Kim

The paper proposes a novel cross-axis feature fusion architecture and an auxiliary joint-difference prediction task to significantly improve text-based 3D human motion editing by better understanding…

View →

cs.AIRecentMay 27, 2026

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Yiheng Zhu, Kangle Deng, Jean-Philippe Fauconnier, Inaki Navarro +8 more

CubePart is a generative framework that enables the creation of complex 3D meshes by explicitly controlling and generating individual, semantically defined parts based on open-vocabulary text prompts.

View →

cs.CVcs.AIeess.IVRecentJun 1, 2026

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

Jingyun Liang, Min Wei, Shikai Li, Yizeng Han +4 more

The paper proposes a novel render-free framework that conditions video diffusion models directly on compressed 3D human mesh tokens, enabling robust 3D-aware human motion control without relying on re…

View →