~ similar to 2605.28763· 19 results
The paper introduces SPAWN, a training-free method that allows users to inject specified visual concepts into existing autoregressive world models, enabling controllable scene composition beyond the i…
Yipeng Gao, Lei Shu, Genzhi Ye, Xi Xiong +4 more
The paper introduces 3DCodeBench, a systematic benchmark and platform for evaluating Vision-Language Model (VLM) agents' ability to generate procedural 3D models from text and images using code.
T2Mo is a novel framework that generates controllable dynamic 3D object shapes by combining explicit 3D trajectories for spatial guidance with natural language text semantics.
The paper introduces MUSE, a comprehensive benchmark that evaluates Text-to-CAD generation by assessing complex assemblies based on functionality, manufacturability, and assemblability, moving beyond…
Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang +7 more
The paper introduces WorldCoder-Bench, a comprehensive benchmark and evaluation protocol for testing LLMs' ability to autonomously generate complex, physically grounded, and interactive 3D web worlds.
PhyGenHOI introduces a novel framework that generates physically accurate and visually faithful 4D Human-Object Interactions by coupling generative human motion with explicit physical object simulatio…
Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more
This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.
The paper introduces Staged Executable Inverse Graphics (SEIG), an agentic framework that uses general-purpose Vision-Language Models (VLMs) to reconstruct editable 3D scenes directly into executable…
Inhee Lee, Sangwon Baik, Sungjoo Kim, Hyeonwoo Kim +2 more
SimuScene introduces a novel compositional 3D reconstruction pipeline that integrates physics simulation directly into the shape and layout estimation process to generate stable, simulation-ready 3D s…
Shaohui Dai, Yansong Qu, You Shen, Shengchuan Zhang +1 more
The paper introduces PAR3D, a unified part-aware 3D-MLLM framework, to enhance 3D scene understanding by enabling models to reason about and ground both whole objects and their fine-grained parts.
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
Haozhe Zhao, Shuzheng Si, Zhenhailong Wang, Zheng Wang +5 more
The paper introduces Crafter, a multi-agent harness that significantly improves the generation of editable, publication-quality scientific figures from diverse inputs, addressing the limitations of ex…
The paper introduces a subgrid marching tetrahedra scheme that accurately recovers complex, intersection-free manifold meshes from tetrahedral grids, overcoming limitations of classic marching methods…
Minkyung Kwon, Jinhyeok Choi, Youngjin Shin, Jaeyeong Kim +2 more
MORPHOS is a novel autoregressive framework that generates dynamic 3D assets (like meshes and radiance fields) from videos by using a unified 4D representation to ensure temporal consistency and handl…
The paper introduces SchGen, the first large language model capable of generating editable PCB schematics from natural language by using a novel semantically grounded code representation.
Jingyun Liang, Min Wei, Shikai Li, Yizeng Han +4 more
The paper proposes a novel render-free framework that conditions video diffusion models directly on compressed 3D human mesh tokens, enabling robust 3D-aware human motion control without relying on re…
pcbGPT is a grounded system that automatically generates editable KiCad PCB schematics from natural language requirements, achieving high accuracy on complex embedded design tasks.
Ben Wang, Xiaogang Li, Ruochen Gao, Peiyao Xiao +5 more
The paper introduces BilliardPhys-Bench, a new benchmark that demonstrates that current multimodal LLMs struggle with complex physical reasoning and predicting object dynamics in simulated environment…
This paper presents Mana, a sim-to-real framework for dexterous articulated tool manipulation.