~ similar to 2605.29425· 19 results
This paper demonstrates that reasoning-enabled Vision-Language-Action (VLA) models for autonomous driving are highly vulnerable to realistic input perturbations, significantly compromising both reason…
DeepIPCv3 is a novel multi-modal framework that fuses LiDAR and DVS event streams using cross-modal attention to achieve state-of-the-art, highly reactive avoidance maneuvers for sudden pedestrian cro…
Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang +1 more
The paper introduces a structured multi-level visual perturbation framework to systematically analyze how dependent VLA-based driving behavior is on visual information, revealing uneven visual groundi…
The paper introduces a new benchmark (BGTD) and a multimodal framework (mmTraffic) that enables explainable, evidence-grounded interpretation of encrypted network traffic using LLMs.
Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more
The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…
The paper develops a trust-aware framework to model how connected vehicles adapt their routing decisions and overall traffic flow when exposed to misinformation, showing that endogenous trust provides…
The paper introduces an uncertainty-aware framework that uses regulated expert advice to guide safe and efficient exploration for autonomous driving policies, significantly improving performance in co…
The paper introduces CAN-QA, a novel question-answering benchmark that reformulates CAN traffic analysis from a classification task to a reasoning task, demonstrating that current LLMs struggle with c…
This paper presents an open-source computer vision pipeline for classifying vehicle body types from naturalistic roadway video.
The paper proposes a zero-shot reason-then-retrieve pipeline using Qwen3.5-27B to solve the challenging task of composed video retrieval (CoVR-R), achieving high performance on both validation and bli…
Yan Wang, Zhixuan Chu, Zihao Xue, Zhen Bi +8 more
The paper introduces ConsisGuard, a framework that addresses the 'deliberation-to-enforcement gap' in LLM guardrails by ensuring that the reasoning process is faithfully and consistently translated in…
Qingwen Pu, Kun Xie, Hong Yang, Di Yang +1 more
The paper develops a novel deep reinforcement learning framework, SMamba-DDPG, to accurately model vehicle-type-specific pedestrian crash avoidance behavior, finding that pedestrians react faster and…
The paper proposes a multi-resolution end-to-end deep neural network for autonomous driving that dynamically adjusts input resolution to optimize the critical tradeoff between prediction accuracy and…
The paper introduces a stealthy, scenario-realistic data fabrication attack that subtly manipulates object poses in shared perception data to induce unsafe driving behaviors in connected and autonomou…
Siyan Li, Zehao Wang, Jiachen Li, Kanok Boriboonsomsin +2 more
This survey reviews how Large and Multi-modal Language Models (LLMs/MM-LLMs) are being applied to integrate diverse data sources for enhanced decision support in transportation systems management and…
Ruina Hu, Chen Wang, Lai Wei, Jionghao Bai +4 more
The paper introduces EASE, a method that enhances multimodal Reinforcement Learning with Verifiable Rewards (RLVR) by providing spatial attention supervision anchored to visual evidence, significantly…
Ke Xu, Yuhao Wang, Ziyang Cheng, Hongcheng Liu +2 more
The paper introduces MOV-Bench, a challenging benchmark for multi-hop audio-visual reasoning, and proposes AOP-Agent, an agentic framework that significantly improves open-source Omni-LLMs' ability to…
Mingjian Gao, Wenqiao Zhang, Yuqian Yuan, Yang Dai +8 more
VISUALTHINK-VLA introduces a visual intermediate-reasoning framework that guides action prediction using compact visual evidence, achieving high accuracy and significantly low latency for real-time Vi…
Shuo Ju, Qingzhao Zhang, Huashan Chen, Xuheng Wang +5 more
The paper introduces a novel adversarial attack that uses static, view-dependent camouflage on a vehicle to induce consistent feature drift, causing autonomous systems to predict false, yet plausible,…