Papers similar to 2606.02273

~ similar to 2606.02273· 19 results

cs.CVcs.AIRecentMay 29, 2026

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang +1 more

The paper introduces a structured multi-level visual perturbation framework to systematically analyze how dependent VLA-based driving behavior is on visual information, revealing uneven visual groundi…

View →

cs.CLcs.AIcs.CVRecentJun 1, 2026

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more

The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…

View →

cs.CVcs.CRcs.LGRecentApr 30, 2026

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

David Fernandez, Pedram MohajerAnsari, Amir Salarpour, Mert D. Pese

This paper systematically analyzes the high cross-architecture transferability of physical adversarial attacks on Vision-Language Models (VLMs) used in autonomous driving, demonstrating that attacks e…

View →

cs.CVRecentJun 1, 2026

Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection

David J. Lerch, Livien Majer, Zeyun Zhong, Manuel Martin +2 more

The paper proposes a novel global multi-modal alignment framework to robustly learn video representations from noisy and complementary sensor data, significantly improving driver distraction detection…

View →

cs.CRcs.LGcs.RORecentMay 27, 2026

ReasonBreak: Probing Vulnerabilities in Reasoning-Enabled Vision-Language-Action Models for Autonomous Driving

Mohammadreza Teymoorianfard, Jean-Philippe Monteuuis, Jonathan Petit, Amir Houmansadr

This paper demonstrates that reasoning-enabled Vision-Language-Action (VLA) models for autonomous driving are highly vulnerable to realistic input perturbations, significantly compromising both reason…

View →

cs.CVcs.LGeess.IVRecentJun 3, 2026

An Open-Source Two-Stage Computer Vision Pipeline for Fine-Grained Vehicle Classification using Vision Transformers

Gandhimathi Padmanaban, Fred Feng

This paper presents an open-source computer vision pipeline for classifying vehicle body types from naturalistic roadway video.

View →

cs.AIRecentMay 28, 2026

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Xiaona Zhou, Muntasir Wahed, Tianjiao Yu, Constantin Brif +1 more

The paper introduces VisAnomReasoner, a parameter-efficient Vision-Language Model (VLM), trained on a new benchmark (VisAnomBench) to accurately and interpretably detect anomalies in time-series data.

View →

cs.AIRecentMay 28, 2026

VLA-Trace: Diagnosing Vision-Language-Action Models through Representation and Behavior Tracing

Haoyuan Shi, Xiancong Ren, Yingji Zhang, Qinfan Zhang +8 more

VLA-Trace is a diagnostic framework that analyzes Vision-Language-Action (VLA) models by tracing their internal representations and external behaviors, revealing that while these models are good at vi…

View →

cs.CVcs.AIRecentMay 28, 2026

VLM3: Vision Language Models Are Native 3D Learners

Zhipeng Cai, Zhuang Liu, Yunyang Xiong, Zechun Liu +2 more

The paper proposes VLM3, a simple, scalable method that demonstrates standard Vision Language Models (VLMs) can natively learn 3D understanding by focusing on architectural simplicity and specific dat…

View →

cs.ROcs.AIcs.LGRecentMay 27, 2026

Multi-Resolution End-to-End Deep Neural Network for Optimizing Latency-Accuracy Tradeoff in Autonomous Driving

Qitao Weng, Heechul Yun

The paper proposes a multi-resolution end-to-end deep neural network for autonomous driving that dynamically adjusts input resolution to optimize the critical tradeoff between prediction accuracy and…

View →

cs.CVcs.AIcs.LGRecentJun 1, 2026

A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision

Stefano Samele, Eugenio Lomurno, Teodora Jovanovic, Sanjay Shivakumar Manohar +2 more

The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…

View →

cs.CRRecentMay 8, 2026

Membership Inference Attacks on Vision-Language-Action Models

Yuefeng Peng, Mingzhe Li, Kejing Xia, Renhao Zhang +1 more

This paper presents the first systematic study of membership inference attacks (MIAs) against Vision-Language-Action (VLA) models, demonstrating that these models are highly vulnerable to privacy brea…

View →

cs.CRcs.CLRecentApr 17, 2026

TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts

Hua-Rong Chu, Kuan-Chun Wang, Yao-Te Huang

The paper introduces TWGuard, a linguistic context-optimized safety guardrail model, demonstrating that tailoring AI safety mechanisms to specific local linguistic contexts significantly improves perf…

View →

cs.CVcs.AIcs.CLRecentMay 31, 2026

On the Limits of Token Reduction for Efficient Unified Vision Language Training

Siyi Chen, Weiming Zhuang, Jingtao Li, Lingjuan Lv

The paper analyzes token reduction for efficient unified VLM training, finding that while task-specific acceleration saves computation, it destroys the mutual performance gains achieved through joint…

View →

cs.CVcs.LGcs.RORecentJun 2, 2026

VLESA: Vision-Language Embodied Safety Agent for Human Activity Monitoring

Hanjiang Hu, Yiyuan Pan, Jiaxing Li, Xusheng Luo +4 more

VLESA is a novel framework that monitors human activities from egocentric video to predict and intervene in dangerous actions by incorporating goal-conditioned safety checks based on inferred intent.

View →

cs.CRcs.CVRecentMay 12, 2026

Still Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Driving

Shuo Ju, Qingzhao Zhang, Huashan Chen, Xuheng Wang +5 more

The paper introduces a novel adversarial attack that uses static, view-dependent camouflage on a vehicle to induce consistent feature drift, causing autonomous systems to predict false, yet plausible,…

View →

cs.ROcs.AIcs.LGRecentMay 29, 2026

Continuous Reasoning for Vision-Language-Action

Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota

The paper proposes Continuous Reasoning for Vision-Language-Action (VLA) models, arguing that effective reasoning must be a shared, verifiable internal latent space rather than discrete text tokens, l…

View →

cs.LGcs.AIRecentMay 27, 2026

IRDS: Interpretable RLVR Data Selection via Verifier-Coupled Sparse Autoencoder Coverage

Yuhan Li, Mingxu Zhang, Dazhong Shen, Ying Sun

IRDS introduces a novel data selection method that uses a verifier-coupled sparse autoencoder framework to efficiently select high-quality Reinforcement Learning with Verifiable Rewards (RLVR) trainin…

View →