Papers similar to 2605.31286

~ similar to 2605.31286· 19 results

cs.ROcs.AIRecentMay 29, 2026

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

Beichen Shao, Mengying Xie, Heng Su, Wanyi Zhang +4 more

GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commo…

View →

cs.ROcs.AIRecentMay 28, 2026

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Zhongxi Chen, Yifan Han, Yanming Shao, Huanming Liu +4 more

BORA is an offline-to-online RL framework that enhances dexterous VLA models for real-world robotics by using an action-conditioned critic and a lightweight residual adaptation mechanism to correct ex…

View →

cs.ROcs.AIRecentJun 4, 2026

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu +3 more

TempoVLA is a novel Vision-Language-Action model that enables controllable execution speed for robot manipulation by explicitly conditioning the policy on the desired speed.

View →

cs.ROcs.AIcs.CLRecentMay 28, 2026

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Qiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye +36 more

Qwen-VLA introduces a unified embodied foundation model that extends vision-language understanding to continuous action generation, enabling robust, multi-task generalization across diverse robotic ta…

View →

cs.ROcs.AIcs.CVRecentMay 27, 2026

Turning Video Models into Generalist Robot Policies

Sizhe Lester Li, Evan Kim, Xingjian Bai, Tong Zhao +3 more

The paper proposes VERA, a decoupled policy that uses an action-free video world model combined with an embodiment-specific Inverse Dynamics Model (IDM) to achieve generalizable, zero-shot robot contr…

View →

cs.ROcs.AIcs.CVRecentMay 28, 2026

VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

Shengyu Si, Yuanzhuo Lu, Ruimeng Yang, Ziyi Ye +2 more

VLA-Pro is a plug-and-play framework that enhances cross-task generalization in Vision-Language-Action models by storing and dynamically retrieving task-specific procedural memories, achieving signifi…

View →

cs.ROcs.AIcs.CVEmpiricalRecentJun 11, 2026

Mana: Dexterous Manipulation of Articulated Tools

Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu

This paper presents Mana, a sim-to-real framework for dexterous articulated tool manipulation.

View →

cs.ROcs.AIRecentMay 31, 2026

Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA

Hung Mai, Bin Zhu, Tuan Do

The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs ofte…

View →

cs.AIRecentMay 29, 2026

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska +4 more

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…

View →

cs.CRcs.AIcs.RORecentMay 18, 2026

Not What You Asked For: Typographic Attacks in Household Robot Manipulation

Ali Iranmanesh, Peng Liu

This paper demonstrates that typographic attacks pose a significant, measurable, and physically consequential threat to household robot manipulation systems by causing the robot to grasp and transport…

View →

cs.CVcs.AIRecentMay 28, 2026

VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies

Mingjian Gao, Wenqiao Zhang, Yuqian Yuan, Yang Dai +8 more

VISUALTHINK-VLA introduces a visual intermediate-reasoning framework that guides action prediction using compact visual evidence, achieving high accuracy and significantly low latency for real-time Vi…

View →

cs.ROcs.AIcs.LGRecentMay 29, 2026

Continuous Reasoning for Vision-Language-Action

Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota

The paper proposes Continuous Reasoning for Vision-Language-Action (VLA) models, arguing that effective reasoning must be a shared, verifiable internal latent space rather than discrete text tokens, l…

View →

cs.ROcs.CVRecentJun 1, 2026

RoboDream: Compositional World Models for Scalable Robot Data Synthesis

Junjie Ye, Rong Xue, Basile Van Hoorick, Runhao Li +5 more

RoboDream introduces an embodiment-centric world model that synthesizes photorealistic, physically feasible robot demonstrations by decoupling motion generation from environment synthesis, significant…

View →

cs.RORecentJun 3, 2026

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

Tianyi Xie, Haotian Zhang, Jinhyung Park, Zi Wang +16 more

This paper presents GRAIL, a digital generation pipeline that synthesizes human-object interactions for humanoid robots.

View →

cs.CRcs.AIcs.RORecentMar 24, 2026

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji +1 more

This paper introduces TRAP, an adversarial attack that demonstrates how physical patches can hijack the Chain-of-Thought (CoT) reasoning process in Vision-Language-Action (VLA) models, forcing them to…

View →

cs.ROcs.AIRecentMay 30, 2026

Shape Your Body: Value Gradients for Multi-Embodiment Robot Design

Nico Bohlinger, Jan Peters

The paper introduces using frozen, generalist value functions as differentiable surrogates to efficiently optimize and analyze new multi-embodiment robot designs without requiring repeated reinforceme…

View →

cs.ROcs.AIcs.NERecentJun 4, 2026

Sample-efficient Low-level Motion Planning for Robotic Manipulation Tasks via Zero-shot Transfer Learning

Yuanzhi He, Victor Romero-Cano, José J. Patiño, Juan David Hernández +2 more

The paper proposes an iCEM+TL framework that combines the Sample-efficient Cross-Entropy Method with Transfer Learning and Reward Redesign to improve robotic motion planning for complex tasks like sta…

View →

cs.RORecentJun 3, 2026

X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Rachel Luo, Michael Watson, Apoorva Sharma, Heng Yang +5 more

This paper introduces X4Val, a framework for variance-reduced real-world metric estimation using non-paired, multi-domain data.

View →

cs.ROcs.AIeess.SYRecentMay 30, 2026

PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation

Haofan Cao, Zhaoyang Li, Zhichao You, Liang Guo +1 more

PaCo-VLA introduces a passivity-shielded compliance prior to safely bridge the gap between high-level Vision-Language-Action (VLA) semantic outputs and low-level, force-sensitive robotic control.

View →