Papers similar to 2606.00515

~ similar to 2606.00515· 20 results

cs.CRcs.AIcs.RORecentMar 24, 2026

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji +1 more

This paper introduces TRAP, an adversarial attack that demonstrates how physical patches can hijack the Chain-of-Thought (CoT) reasoning process in Vision-Language-Action (VLA) models, forcing them to…

View →

cs.CVcs.AIRecentMay 28, 2026

VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies

Mingjian Gao, Wenqiao Zhang, Yuqian Yuan, Yang Dai +8 more

VISUALTHINK-VLA introduces a visual intermediate-reasoning framework that guides action prediction using compact visual evidence, achieving high accuracy and significantly low latency for real-time Vi…

View →

cs.ROcs.AIRecentJun 4, 2026

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Dong Jing, Jingchen Nie, Tianqi Zhang, Jiaqi Liu +3 more

TempoVLA is a novel Vision-Language-Action model that enables controllable execution speed for robot manipulation by explicitly conditioning the policy on the desired speed.

View →

cs.AIRecentMay 29, 2026

Closed-Loop Neural Activation Control in Vision-Language-Action Models

Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska +4 more

The paper proposes CTRL-STEER, a closed-loop framework that adaptively adjusts intervention strength to stabilize concept regulation and improve task success in Vision-Language-Action models without r…

View →

cs.CRcs.LGcs.RORecentMay 27, 2026

ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving

Mohammadreza Teymoorianfard, Jean-Philippe Monteuuis, Jonathan Petit, Amir Houmansadr

This paper demonstrates that reasoning-enabled Vision-Language-Action (VLA) models for autonomous driving are highly vulnerable to realistic input perturbations, significantly compromising both reason…

View →

cs.ROcs.AIRecentMay 29, 2026

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

Beichen Shao, Mengying Xie, Heng Su, Wanyi Zhang +4 more

GSAM introduces a generalizable and safe robotic framework for articulated object manipulation, significantly improving success rates and reducing variability across diverse tasks by integrating commo…

View →

cs.ROcs.AIRecentMay 28, 2026

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

Zhongxi Chen, Yifan Han, Yanming Shao, Huanming Liu +4 more

BORA is an offline-to-online RL framework that enhances dexterous VLA models for real-world robotics by using an action-conditioned critic and a lightweight residual adaptation mechanism to correct ex…

View →

cs.ROcs.AIcs.LGRecentMay 29, 2026

Continuous Reasoning for Vision-Language-Action

Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota

The paper proposes Continuous Reasoning for Vision-Language-Action (VLA) models, arguing that effective reasoning must be a shared, verifiable internal latent space rather than discrete text tokens, l…

View →

cs.ROcs.AIcs.LGRecentMay 27, 2026

Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation

Jiahe Pan, Stelian Coros, Jitendra Malik, Toru Lin

The paper introduces Center-of-Pressure (CoP), a physics-grounded tactile representation that enables robust zero-shot sim-to-real transfer for complex, contact-rich manipulation tasks.

View →

cs.ROcs.AIRecentMay 31, 2026

Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA

Hung Mai, Bin Zhu, Tuan Do

The paper introduces a diagnostic framework to determine if World-Action Models (WAMs) provide genuinely actionable behavioral improvements beyond simply achieving task success, finding that WAMs ofte…

View →

cs.ROcs.AIRecentMay 29, 2026

DeMaVLA: A Vision-Language-Action Foundation Model for Generalizable Deformable Manipulation

Taiyi Su, Jian Zhu, Tianjian Wang, Youzhang He +8 more

DeMaVLA is a generalizable Vision-Language-Action foundation model designed for deformable object manipulation, achieving strong real-world performance on folding tasks by leveraging large-scale real-…

View →

cs.LGcs.AIcs.CRRecentApr 8, 2026

When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models

Ismail Hossain, Sai Puppala, Jannatul Ferdaus, Md Jahangir Alam +3 more

The paper demonstrates that fine-tuning safety guard models on benign data can catastrophically collapse their safety alignment, proposing Fisher-Weighted Safety Subspace Regularization (FW-SSR) to ac…

View →

cs.CRRecentJun 2, 2026

Same Weights, Different Robot: A Deployment Safety View of VLA Policies

Jianwei Tai

The paper identifies a 'deployment-safety gap' in Vision-Language-Action (VLA) policies, showing that identical model checkpoints can result in physically different and unsafe robot actions due to act…

View →

cs.CRRecentMay 8, 2026

Membership Inference Attacks on Vision-Language-Action Models

Yuefeng Peng, Mingzhe Li, Kejing Xia, Renhao Zhang +1 more

This paper presents the first systematic study of membership inference attacks (MIAs) against Vision-Language-Action (VLA) models, demonstrating that these models are highly vulnerable to privacy brea…

View →

cs.CRcs.LGRecentMay 25, 2026

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models

Jianwei Tai

The paper establishes a theoretical information-theoretic bound proving that for Vision-Language-Action (VLA) models, capability and robustness cannot both be arbitrarily high, quantifying the trade-o…

View →

cs.ROcs.AIRecentMay 29, 2026

Completion at the Boundary (CaB): Deployable Switching with Completion-Aware Control under Limited Calibration

Yusuke Sano, Takeshi Itoga

The paper proposes Completion at the Boundary (CaB), a method that predicts event-local completion objects using Boundary-Phase Tokens to improve the reliability of multi-step, handoff-heavy tasks for…

View →

cs.CVcs.AIcs.CLRecentMay 29, 2026

Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration

Jun Wang, Xiaohao Xu, Xiaonan Huang

The paper introduces TouchSafeBench, a physics-grounded benchmark, to evaluate collision grounding—the ability to predict robot-human collisions—and finds that current Vision-Language Models (VLMs) ar…

View →

cs.AIcs.CRcs.LGRecentMar 22, 2026

Silent Commitment Failure in Instruction-Tuned Language Models: Evidence of Governability Divergence Across Architectures

Gregory M. Ruddell

The paper demonstrates that many instruction-tuned language models suffer from 'silent commitment failure,' meaning they can produce confidently incorrect outputs without any warning signal, and intro…

View →

cs.CRcs.AIcs.CVRecentMar 28, 2026

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

Xiao Li, Xiang Zheng, Yifeng Gao, Xinyu Xia +34 more

This survey provides a comprehensive, structured review of safety research in Embodied AI, analyzing attacks and defenses across the entire embodied pipeline to guide the development of safe, robust,…

View →

cs.CRcs.AIcs.RORecentApr 29, 2026

From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems

Neha Nagaraja, Hayretdin Bahsi, Carlo R. da Cunha

The paper provides a holistic threat model for LLM-enabled robotic systems by analyzing how conventional, adversarial, and conversational threats propagate across the entire perception-planning-actuat…

View →