Robotics

Robot learning, manipulation, navigation, embodied AI

20 papers indexed

cs.ROcs.CVEmpiricalRecentJun 23, 2026

ArtiTwinSplat: Interactable Digital Twin Reconstruction via Gaussian Splatting from RGB-D videos

Pranjal Mishra, René Zurbrügg, Max Wilder-Smith, Marco Hutter +3 more

This paper presents ArtiTwinSplat, a framework for constructing articulated, photo-realistic digital twins of objects directly from RGB-D videos in real-world environments.

View →

cs.ROcs.AIcs.CVRecentMay 27, 2026

Turning Video Models into Generalist Robot Policies

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more

The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…

View →

cs.ROEmpiricalRecentJul 17, 2026

Handroid: Bridging Dexterous Hand and Humanoid

Ruogu Li, Chenyang Ma, Sikai Li, Zhenyu Wei +5 more

A single robot platform, Handroid, is introduced that can function as both a dexterous hand and a humanoid robot, with interchangeable control and learning frameworks.

View →

cs.ROcs.AIcs.CVEmpiricalRecentJun 11, 2026

Mana: Dexterous Manipulation of Articulated Tools

Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu

This paper presents Mana, a sim-to-real framework for dexterous articulated tool manipulation.

View →

cs.ROcs.AIcs.CLRecentMay 28, 2026

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Qiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye +36 more

Qwen-VLA introduces a unified embodied foundation model that extends vision-language understanding to continuous action generation, enabling robust, multi-task generalization across diverse robotic ta…

View →

cs.ROcs.CVEmpiricalRecentJul 24, 2026

Robot-Factored World Models via Robot Rendering

Byungjun Kim, Taeksoo Kim, Hyunsoo Cha, Hanbyul Joo

This paper proposes robot-factored world models for action-conditioned video prediction in robotics, which factor out action realization and robot rendering to avoid learning the realization process a…

View →

cs.ROcs.CVSurveyRecentJul 27, 2026

Data Pyramid for Embodied Manipulation

Yifan Ye, Yankai Fu, Yaoxu Lv, Bohan Hou +25 more

This paper organizes embodied data sources for multimodal foundation models into a pyramid, focusing on real-robot, UMI-style, egocentric and exocentric, simulation, and general vision-language data.

View →

cs.CVcs.AIcs.CLRecentMay 29, 2026

Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

Jun Wang, Xiaohao Xu, Xiaonan Huang

The paper introduces TouchSafeBench, a physics-grounded benchmark, to evaluate collision grounding—the ability to predict robot-human collisions—and finds that current Vision-Language Models (VLMs) ar…

View →

cs.CRcs.AIcs.CVRecentMar 28, 2026

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

Xiao Li, Xiang Zheng, Yifeng Gao, Xinyu Xia +34 more

This survey provides a comprehensive, structured review of safety research in Embodied AI, analyzing attacks and defenses across the entire embodied pipeline to guide the development of safe, robust,…

View →

cs.CVEmpiricalRecentJul 8, 2026

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

Shuailei Ma, Jiaqi Liao, Xinyang Wang, Jingjing Wang +23 more

This paper introduces LingBot-Video, a video pretraining paradigm for embodied intelligence using a DiT-based approach, Mixture-of-Experts framework, and extensive robot-oriented data.

View →

cs.ROcs.AIcs.LGRecentJun 4, 2026

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou +4 more

This paper proposes a compact, explicit interface for humanoid robots that enables diverse manipulation skills and demonstrates its feasibility through natural-language-driven task roll-outs.

View →

cs.ROcs.AIcs.NERecentJun 4, 2026

Sample-efficient Low-level Motion Planning for Robotic Manipulation Tasks via Zero-shot Transfer Learning

Yuanzhi He, Victor Romero-Cano, José J. Patiño, Juan David Hernández +2 more

The paper proposes an iCEM+TL framework that combines the Sample-efficient Cross-Entropy Method with Transfer Learning and Reward Redesign to improve robotic motion planning for complex tasks like sta…

View →

cs.ROEmpiricalRecentJul 17, 2026

Let the Body Follow: Coupled Egocentric Control for Whole-Body Robot Teleoperation

Tsung-Chi Lin, Yichen Xie, Chien-Ming Huang

This paper proposes coupled egocentric control, a body-following teleoperation approach for whole-body robot control, improving efficiency, reducing effort, and increasing ease of use.

View →

cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →

cs.ROcs.AIRecentJun 4, 2026

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu +3 more

TempoVLA is a novel Vision-Language-Action model that enables controllable execution speed for robot manipulation by explicitly conditioning the policy on the desired speed.

View →

cs.ROcs.CRRecentMay 15, 2026

Propagating Unsafe Actions in LLM Controlled Multi-Robot Collaboration via Single Robot Compromise

Zhen Huang, Zhihuang Liu, Mengxuan Luo, Weishang Wu +1 more

The paper proposes a novel attack paradigm demonstrating how compromising a single robot in an LLM-controlled multi-robot system can rapidly propagate malicious intent to cause coordinated unsafe acti…

View →

cs.ROcs.AIcs.LGEmpiricalRecentJun 10, 2026

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

Steven Oh, Jason Jingzhou Liu, Tony Tao, Philip Han +4 more

This paper presents a data-driven method to estimate external joint torques without dedicated force sensors, enabling force-feedback teleoperation on low-cost arms.

View →

cs.CVcs.ROEmpiricalRecentJul 24, 2026

Geometric 2D Scene Graph Generation

Christoph Jahn, Urs Waldmann, Bastian Goldluecke

The paper proposes a method for constructing scene graphs to represent and characterize assembly relationships between components using a Faster R-CNN model, transformer architecture, adjacency matrix…

View →

cs.CRcs.AIcs.RORecentMay 18, 2026

Not What You Asked For: Typographic Attacks in Household Robot Manipulation

Ali Iranmanesh, Peng Liu

This paper demonstrates that typographic attacks pose a significant, measurable, and physically consequential threat to household robot manipulation systems by causing the robot to grasp and transport…

View →

cs.ROcs.AIcs.CLEmpiricalRecentJul 23, 2026

GS-Agent: Creating 4D Physical Worlds With Generative Simulation

Hongxin Zhang, Chunru Lin, Junyan Li, Zhou Xian +2 more

This paper introduces GS-Agent, an end-to-end multi-agent framework that generates realistic, dynamic, and controllable 4D physical worlds from natural language descriptions by emulating human creatio…

View →