Papers similar to 2606.02575

~ similar to 2606.02575· 18 results

cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →

cs.AIRecentMay 27, 2026

CubePart: An Open-Vocabulary Part-Controllable 3D Generator

Yiheng Zhu, Kangle Deng, Jean-Philippe Fauconnier, Inaki Navarro +8 more

CubePart is a generative framework that enables the creation of complex 3D meshes by explicitly controlling and generating individual, semantically defined parts based on open-vocabulary text prompts.

View →

cs.AIRecentJun 1, 2026

WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis

Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more

The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.

View →

cs.CVcs.AIRecentMay 28, 2026

Archon: A Unified Multimodal Model for Holistic Digital Human Generation

Chong Bao, Shichen Liu, Lijun Yu, David Futschik +8 more

The paper introduces Archon, a unified, fully pretrained multimodal model that addresses the challenge of generating holistic digital humans by integrating seven modalities (including text, audio, mot…

View →

cs.CVRecentJun 1, 2026

Thinking in Blender: Staged Executable Inverse Graphics with Vision-Language Models

Guangzhao He, Rundong Luo, Wei-Chiu Ma, Hadar Averbuch-Elor

The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…

View →

cs.CVRecentJun 1, 2026

MORPHOS: Autoregressive 4D Generation with Temporal Structured Latents

Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more

MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…

View →

cs.CVcs.AIcs.GRRecentMay 31, 2026

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more

The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.

View →

cs.ROcs.AIcs.CVRecentMay 27, 2026

Turning Video Models into Generalist Robot Policies

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more

The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…

View →

cs.RORecentJun 3, 2026

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more

This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.

View →

cs.AIRecentMay 28, 2026

Procedural Generation of First Person Shooter Maps using Map-Elites

Simone de Donato, Pier Luca Lanzi, Daniele Loiacono

This paper applies the MAP-Elites algorithm to procedurally generate diverse and high-quality First-Person Shooter maps using novel map representations.

View →

cs.RORecentJun 3, 2026

Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

Luca Zanatta, Grzegorz Malczyk, Kostas Alexis

This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.

View →

cs.CVRecentJun 4, 2026

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao +3 more

The paper proposes Astra, an agentic framework that equips Vision-Language Models (VLMs) with the ability to perform spatial reasoning by actively generating and utilizing imagined visual evidence fro…

View →

cs.SEcs.AIcs.CVRecentMay 27, 2026

GUI Agents for Continual Game Generation

Yixu Huang, Bo Li, Na Li, Zhe Wang +7 more

The paper proposes using GUI agents, both as objective evaluators and subjective playtesters, to significantly improve the generation of playable games from prompts, demonstrating a 66.8% rubric pass-…

View →

cs.AIphysics.app-phRecentMay 29, 2026

BilliardPhys-Bench: Benchmarking Physical Reasoning and Visual Dynamics of Multimodal LLMs

Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more

The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…

View →

cs.LGcs.AIcs.CLRecentJun 1, 2026

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai +6 more

The paper introduces OpenWebRL, an open framework that enables training visual web agents using online multi-turn Reinforcement Learning directly on live websites, achieving state-of-the-art performan…

View →

cs.CVcs.AIRecentMay 28, 2026

PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

Omer Benishu, Gal Fiebelman, Sagie Benaim

PhyGenHOI introduces a novel framework that generates physically accurate and visually faithful 4D Human-Object Interactions by coupling generative human motion with explicit physical object simulatio…

View →

cs.GRcs.AIcs.LGRecentMay 29, 2026

SWIM: Single-Instance Whole-Body Imitation for swiMming

Binglun Wang, Edmond S. L. Ho, He Wang

The paper proposes SWIM, a novel imitation learning method that can synthesize physically-based swimming motions from a single example, demonstrating superior data efficiency and generalization across…

View →

cs.CVRecentJun 1, 2026

Geometry-Aware Implicit Memory for Video World Models

Zhengxuan Wei, Xu Guo, Xinghui Li, Xunzhi Xiang +7 more

The paper proposes GIM-World, a geometry-aware implicit memory framework that significantly improves long-horizon video world models by explicitly encoding 3D scene geometry into a compact memory stat…

View →