ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2606.00267· 16 results

cs.ROcs.AIcs.CVRecentMay 27, 2026

Turning Video Models into Generalist Robot Policies

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more

The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…

View →
cs.ROcs.AIRecentMay 31, 2026

Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA

Hung Mai, Bin Zhu, Tuan Do

The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs ofte…

View →
cs.AIRecentMay 28, 2026

MiraBench: Evaluating Action-Conditioned Reliability in Robotic World Models

Tianzhuo Yang, Zihan Shen, Zirui Mi, Zhaoyi Zhang +6 more

The paper introduces MiraBench, a new benchmark that evaluates the action-conditioned reliability of robotic world models, finding that visual fidelity is insufficient and that optimism bias is a perv…

View →
cs.AIRecentJun 1, 2026

TERRA: Task-Embedded Reasoning and Representation Architecture for Cross-Domain Applications

Shayan Shokri

The paper formally addresses the challenging question of cross-domain transferability of latent predictive models by proposing a structured framework that quantifies the relationship between source an…

View →
cs.CRcs.AIcs.LGRecentApr 1, 2026

Safety, Security, and Cognitive Risks in World Models

Manoj Parmar

This paper surveys the risks associated with world models, proposing a unified threat model and demonstrating adversarial attacks that show world models require rigorous safety standards comparable to…

View →
cs.CVcs.AIRecentMay 30, 2026

Pause and Think: A Dataset and Benchmark for Video-Grounded Assistive Action Suggestion

Shivam Singh, Saptarshi Majumdar, Pratik Prabhanjan, Zicheng Liu +1 more

The paper introduces pause-and-think-T, a reasoning-centric dataset and benchmark that enables compact Vision-Language Models to perform visually grounded, context-aware action suggestion, matching la…

View →
cs.RORecentJun 3, 2026

Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation

Luca Zanatta, Grzegorz Malczyk, Kostas Alexis

This paper investigates the robustness of world models in vision-based quadrotor navigation and identifies factors governing their quality.

View →
cs.CVcs.CLcs.RORecentJun 1, 2026

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation

Huiqiong Li, Jiayu Wang, Zhiting Mei, Anirudha Majumdar +2 more

The paper introduces RoboTrustBench, a comprehensive benchmark that evaluates the trustworthiness of video world models for robotic manipulation across challenging scenarios, finding that current mode…

View →
cs.RORecentJun 3, 2026

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more

This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.

View →
cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →
cs.CVcs.CLRecentMay 30, 2026

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding

Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom +5 more

The paper proposes Visual Gradient Steering (VGS), a method that decomposes the distillation loss into language and visual components and steers the optimization to prioritize visual grounding, signif…

View →
cs.CVRecentJun 1, 2026

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

Kiymet Akdemir, Pinar Yanardag

The paper introduces SPAWN, a training-free method that allows users to inject specified visual concepts into existing autoregressive world models, enabling controllable scene composition beyond the i…

View →
cs.CRRecentMay 2, 2026

From Stealthy Data Fabrication to Unsafe Driving: Realistic Scenario Attacks on Collaborative Perception

Qingzhao Zhang, Runting Zhang, Z. Morley Mao

The paper introduces a stealthy, scenario-realistic data fabrication attack that subtly manipulates object poses in shared perception data to induce unsafe driving behaviors in connected and autonomou…

View →
cs.AIRecentMay 29, 2026

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska +4 more

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…

View →
cs.LGcs.AIRecentMay 30, 2026

Behavior-Invariant Task Representation Learning with Transformer-based World Models for Offline Meta-Reinforcement Learning

Fuyuan Qian, Menglong Zhang, Song Wang, Quanying Liu

The paper proposes a novel framework combining behavior-invariant task representation learning and a Transformer-based world model to achieve robust generalization in offline meta-reinforcement learni…

View →
cs.LGcs.AIRecentJun 1, 2026

Policy and World Modeling Co-Training for Language Agents

Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu +8 more

The paper proposes PaW, a co-training framework that uses standard RL rollouts to provide auxiliary world model supervision directly during policy training, significantly improving language agent perf…

View →