ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.30268· 19 results

cs.RORecentJun 3, 2026

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more

This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.

View →
cs.GRcs.AIcs.LGRecentMay 29, 2026

SWIM: Single-Instance Whole-Body Imitation for swiMming

Binglun Wang, Edmond S. L. Ho, He Wang

The paper proposes SWIM, a novel imitation learning method that can synthesize physically-based swimming motions from a single example, demonstrating superior data efficiency and generalization across…

View →
cs.AIcs.CVRecentMay 28, 2026

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

Nafiul Haque, Syed Nazmus Sakib, Shifat E Arman

PhyDrawGen is a neuro-symbolic pipeline that generates physically accurate diagrams from natural language by explicitly enforcing physical laws and geometric constraints, significantly outperforming c…

View →
cs.ROcs.AIRecentMay 29, 2026

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

Beichen Shao, Mengying Xie, Heng Su, Wanyi Zhang +4 more

GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commo…

View →
cs.CVcs.AIRecentMay 28, 2026

Archon: A Unified Multimodal Model for Holistic Digital Human Generation

Chong Bao, Shichen Liu, Lijun Yu, David Futschik +8 more

The paper introduces Archon, a unified, fully pretrained multimodal model that addresses the challenge of generating holistic digital humans by integrating seven modalities (including text, audio, mot…

View →
cs.ROcs.AIcs.CVRecentMay 27, 2026

Turning Video Models into Generalist Robot Policies

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more

The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…

View →
cs.CVcs.RORecentJun 2, 2026

SimuScene: Simulation-Ready Compositional 3D Scene Reconstruction from a Single Image

Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim +2 more

SimuScene introduces a novel compositional 3D reconstruction pipeline that integrates physics simulation directly into the shape and layout estimation process to generate stable, simulation-ready 3D s…

View →
cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →
cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →
cs.ROcs.AIcs.LGRecentMay 27, 2026

Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation

Jiahe Pan, Stelian Coros, Jitendra Malik, Toru Lin

The paper introduces Center-of-Pressure (CoP), a physics-grounded tactile representation that enables robust zero-shot sim-to-real transfer for complex, contact-rich manipulation tasks.

View →
cs.CVRecentJun 2, 2026

NewtPhys: Do Foundation Models Understand Newtonian Physics?

Sebastian Cavada, Soumava Paul, Tuan-Hung Vu, Andrei Bursuc +1 more

The paper introduces NewtPhys, a novel 4D dataset of real-world scenes with dense physical annotations, to systematically evaluate and reveal the limitations of foundation models in low-level Newtonia…

View →
cs.CVcs.AIRecentMay 28, 2026

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

Yiheng Li, Zhuo Li, Ruibing Hou, Yingjie Chen +3 more

The paper introduces AnyMo, a unified multimodal framework that enables high-quality, scalable conditional human motion generation by leveraging a massive, cross-modal dataset and a masked modeling tr…

View →
cs.CVcs.AIcs.CLRecentMay 29, 2026

Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

Jun Wang, Xiaohao Xu, Xiaonan Huang

The paper introduces TouchSafeBench, a physics-grounded benchmark, to evaluate collision grounding—the ability to predict robot-human collisions—and finds that current Vision-Language Models (VLMs) ar…

View →
cs.CVRecentJun 1, 2026

MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents

Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more

MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…

View →
cs.AIRecentMay 27, 2026

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Yiheng Zhu, Kangle Deng, Jean-Philippe Fauconnier, Inaki Navarro +8 more

CubePart is a generative framework that enables the creation of complex 3D meshes by explicitly controlling and generating individual, semantically defined parts based on open-vocabulary text prompts.

View →
cs.CVcs.AIeess.IVRecentJun 1, 2026

Towards 3D-Aware Video Diffusion Models: Render-Free Human Motion Control with Mesh Tokenization

Jingyun Liang, Min Wei, Shikai Li, Yizeng Han +4 more

The paper proposes a novel render-free framework that conditions video diffusion models directly on compressed 3D human mesh tokens, enabling robust 3D-aware human motion control without relying on re…

View →
cs.CVRecentJun 3, 2026

Controllable Dynamic 3D Shape Generation via 3D Trajectories and Text

Jaeyeong Kim, Ines Kim, Jahyeok Koo, Seungryong Kim

T2Mo is a novel framework that generates controllable dynamic 3D object shapes by combining explicit 3D trajectories for spatial guidance with natural language text semantics.

View →
cs.ROcs.AIRecentMay 30, 2026

Shape Your Body: Value Gradients for Multi-Embodiment Robot Design

Nico Bohlinger, Jan Peters

The paper introduces using frozen, generalist value functions as differentiable surrogates to efficiently optimize and analyze new multi-embodiment robot designs without requiring repeated reinforceme…

View →
cs.ROcs.AIcs.CVEmpiricalRecentJun 11, 2026

Mana: Dexterous Manipulation of Articulated Tools

Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu

This paper presents Mana, a sim-to-real framework for dexterous articulated tool manipulation.

View →