~ similar to 2604.05642v1· 16 results
The paper introduces a new benchmark (BGTD) and a multimodal framework (mmTraffic) that enables explainable, evidence-grounded interpretation of encrypted network traffic using LLMs.
The paper addresses the difficulty of using general vision-language models (VLMs) for fine-grained driver behavior recognition by creating a new, richly described dataset and demonstrating that fine-t…
This paper presents an open-source computer vision pipeline for classifying vehicle body types from naturalistic roadway video.
KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.
The paper proposes a privacy-preserving smart surveillance framework that uses a MobileNetV2-based classifier for violence detection and employs decentralized, threshold-based encryption for evidence…
Siyan Li, Zehao Wang, Jiachen Li, Kanok Boriboonsomsin +2 more
This survey reviews how Large and Multi-modal Language Models (LLMs/MM-LLMs) are being applied to integrate diverse data sources for enhanced decision support in transportation systems management and…
Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more
The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…
This study provides the first large-scale analysis of video piracy on Telegram, quantifying its massive financial impact and developing a resilient detection framework, Anti-RIP, to combat it.
Zhengyang Tang, Yuxuan Liu, Xin Lai, Junyi Li +20 more
The paper introduces PhoneWorld, a scalable pipeline that automatically converts real-world GUI trajectories and screenshots into controllable, reproducible phone-use environments, significantly impro…
The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…
This paper systematically measured web tracking across 20 popular AI chatbots, finding that a majority share both conversational content and user identity information with third parties.
Taro Tsuchiya, Haoxiang Yu, Tina Marjanov, Alice Hutchings +2 more
This paper provides a large-scale characterization of Telegram bots, revealing that while they serve useful functions like crowdsourcing, they are also extensively used for malicious activities such a…
Xiaolin Liu, Yilun Zhu, Xiangyu Zhao, Xuehui Wang +8 more
The paper introduces Moment-Video, a new benchmark that diagnoses the ability of video MLLMs to understand brief, critical visual events, revealing that current models struggle significantly with temp…
The paper demonstrates that passive motion traces recorded during a mobile selfie capture can serve as a measurable, low-friction auxiliary signal for enhancing both spoof screening and user identity…
The paper proposes a privacy-preserving visual monitoring system that performs object detection and generates natural language alerts entirely on an edge device, ensuring GDPR compliance by never tran…
This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…