VisualThink-VLA: Visual Intermediate Reasoning for Effective and Low-Latency Vision-Language-Action Policies

VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing

VLA-Trace is a diagnostic framework that analyzes Vision-Language-Action (VLA) m…

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Qwen-VLA introduces a unified embodied foundation model that extends vision-lang…

VLA-Pro: Cross-Task Procedural Memory Transfer for Vision-Language-Action Models

VLA-Pro is a plug-and-play framework that enhances cross-task generalization in…

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterou…

BORA is an offline-to-online RL framework that enhances dexterous VLA models for…

Look on Demand: A Cognitive Scheduling Framework for Visual Evidence Acquisition in Multimodal Reaso…

The paper proposes CSMR, a cognitive scheduling framework that allows a language…

Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial…

The paper evaluates the performance of Vision-Language Models (VLMs) in a collab…

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Groun…

The paper proposes Visual Gradient Steering (VGS), a method that decomposes the…

Semantic and Visual Evidence for Efficient Long-Video Reasoning: A Solution for the HD-EPIC VQA Chal…

The paper proposes a unified framework that decouples long-video reasoning into…