Papers similar to 2606.05149

~ similar to 2606.05149· 19 results

cs.CLcs.AIcs.CVRecentJun 1, 2026

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more

The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…

View →

cs.CVRecentJun 1, 2026

Vision-language Models for Driver Monitoring Systems: A Driver Activity Description Dataset

David J. Lerch, Sarath Mulugurthi, Manuel Martin, Frederik Diederichs +1 more

The paper addresses the difficulty of using general vision-language models (VLMs) for fine-grained driver behavior recognition by creating a new, richly described dataset and demonstrating that fine-t…

View →

cs.CRcs.AIcs.DCRecentMar 19, 2026

FedTrident: Resilient Road Condition Classification Against Poisoning Attacks in Federated Learning

Sheng Liu, Panos Papadimitratos

FedTrident proposes a comprehensive framework to defend Federated Learning-based Road Condition Classification against Targeted Label-Flipping Attacks, achieving robust performance comparable to non-a…

View →

cs.CVRecentJun 1, 2026

Multi-modal Video Representation Alignment for Robust Self-supervised Driver Distraction Detection

David J. Lerch, Livien Majer, Zeyun Zhong, Manuel Martin +2 more

The paper proposes a novel global multi-modal alignment framework to robustly learn video representations from noisy and complementary sensor data, significantly improving driver distraction detection…

View →

cs.CVcs.CRRecentMar 17, 2026

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

Viraj Panchal, Tanmay Talsaniya, Parag Patel, Meet Patel

KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.

View →

cs.AIRecentMay 27, 2026

Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning

Qingwen Pu, Kun Xie, Hong Yang, Di Yang +1 more

The paper develops a novel deep reinforcement learning framework, SMamba-DDPG, to accurately model vehicle-type-specific pedestrian crash avoidance behavior, finding that pedestrians react faster and…

View →

cs.CVcs.AIRecentJun 1, 2026

Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks

Nermeen Abou Baker, David Rohrschneider, Uwe Handmann

This paper investigates the application of Parameter-Efficient Fine-Tuning (PEFT) methods, specifically adapters and LoRA, to large pretrained models for instance segmentation, demonstrating that thes…

View →

cs.CVcs.AIRecentMay 29, 2026

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior?

Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang +1 more

The paper introduces a structured multi-level visual perturbation framework to systematically analyze how dependent VLA-based driving behavior is on visual information, revealing uneven visual groundi…

View →

cs.CVcs.AIcs.LGRecentJun 1, 2026

Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association

Matvei Shelukhan, Timur Mamedov, Aleksandr Chukhrov, Karina Kvanchiani

The paper identifies a fundamental mismatch between standard pairwise ranking metrics (like AP and FPR-95) and the true assignment objective in multi-view object association, proposing a Sinkhorn-base…

View →

cs.AIRecentJun 1, 2026

TrafficRAG: A Multimodal RAG Framework for Traffic Accident Liability Determination

Xu Li, Zedong Fu, Xinyi Li, Xun Han

TrafficRAG is a multimodal retrieval-augmented framework that automates traffic accident liability determination by integrating visual evidence, structured legal knowledge, and advanced LLM reasoning.

View →

cs.ROcs.AIcs.CVRecentMay 31, 2026

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

Oskar Natan, Andi Dharmawan, Aufaclav Zatu Kusuma Frisky, Jazi Eko Istiyanto +1 more

DeepIPCv3 is a novel multi-modal framework that fuses LiDAR and DVS event streams using cross-modal attention to achieve state-of-the-art, highly reactive avoidance maneuvers for sudden pedestrian cro…

View →

cs.CVcs.AIcs.CRRecentMay 20, 2026

Comparative Evaluation of Deep Learning Models for Fake Image Detection

Akhitha Pakala, Mohammed Mahir Rahman, Shahzad Memon, Tauseef Ahmed

This study comparatively evaluates four CNN architectures (VGG16, ResNet50, EfficientNetB0, and XceptionNet) for fake image detection, finding VGG16 achieved the highest accuracy (91%).

View →

cs.CVcs.AIRecentJun 1, 2026

Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection

Atmika Bhardwaj, Silvia Vock, Nico Steckhan

The paper demonstrates that using synthetic hand images containing accessories, generated via inpainting, significantly improves the robustness of hand detectors for safety-critical applications by cl…

View →

cs.CRcs.CYcs.LGRecentApr 21, 2026

Towards a Systematic Risk Assessment of Deep Neural Network Limitations in Autonomous Driving Perception

Svetlana Pavlitska, Christopher Gerking, J. Marius Zöllner

This paper proposes a systematic joint workflow combining HARA and TARA to comprehensively identify and analyze risks stemming from inherent limitations of Deep Neural Networks (DNNs) used in autonomo…

View →

cs.CVcs.AIRecentJun 1, 2026

Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift

Adrián Cánovas-Rodriguez, Miguel A. González-Illán, Maria Fernanda García-Cruz, Pedro Nortes Tortosa +4 more

The paper proposes an attention-enhanced deep learning framework using EfficientNet and CBAM to achieve high accuracy (93.3%) in classifying peach leaf damage, demonstrating improved robustness under…

View →

cs.CVcs.AIcs.LGRecentJun 1, 2026

A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision

Stefano Samele, Eugenio Lomurno, Teodora Jovanovic, Sanjay Shivakumar Manohar +2 more

The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…

View →

cs.CRcs.CVRecentMay 12, 2026

Still Camouflage, Moving Illusion: View-Induced Trajectory Manipulation in Autonomous Driving

Shuo Ju, Qingzhao Zhang, Huashan Chen, Xuheng Wang +5 more

The paper introduces a novel adversarial attack that uses static, view-dependent camouflage on a vehicle to induce consistent feature drift, causing autonomous systems to predict false, yet plausible,…

View →

cs.CVcs.CRcs.LGRecentApr 30, 2026

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

David Fernandez, Pedram MohajerAnsari, Amir Salarpour, Mert D. Pese

This paper systematically analyzes the high cross-architecture transferability of physical adversarial attacks on Vision-Language Models (VLMs) used in autonomous driving, demonstrating that attacks e…

View →

cs.AIRecentMay 28, 2026

ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control

Aoyu Pang, Maonan Wang, Yuejiao Xie, Chung Shue Chen +2 more

ReasonLight is a multimodal foundation model-enhanced RL framework that enables zero-shot traffic signal control by semantically refining RL-proposed actions using heterogeneous sensor and camera data…

View →