~ similar to 2606.00267· 16 results
Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more
The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…
The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs ofte…
Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang +6 more
The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a perv…
The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…
This paper surveys the risks associated with world models, proposing a unified threat model and demonstrating adversarial attacks that show world models require rigorous safety standards comparable to…
The paper introduces pause-and-think-T, a reasoning-centric dataset and benchmark that enables compact Vision-Language Models to perform visually grounded, context-aware action suggestion, matching la…
This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.
Huiqiong Li, Jiayu Wang, Zhiting Mei, Anirudha Majumdar +2 more
The paper introduces RoboTrustBench, a comprehensive benchmark that evaluates the trustworthiness of video world models for robotic manipulation across challenging scenarios, finding that current mode…
Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more
This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.
Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more
RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…
Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom +5 more
The paper proposes Visual Gradient Steering (VGS), a method that decomposes the distillation loss into language and visual components and steers the optimization to prioritize visual grounding, signif…
The paper introduces SPAWN, a training-free method that allows users to inject specified visual concepts into existing autoregressive world models, enabling controllable scene composition beyond the i…
The paper introduces a stealthy, scenario-realistic data fabrication attack that subtly manipulates object poses in shared perception data to induce unsafe driving behaviors in connected and autonomou…
The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…
The paper proposes a novel framework combining behavior-invariant task representation learning and a Transformer-based world model to achieve robust generalization in offline meta-reinforcement learni…
Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more
The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…