Papers similar to 2606.02342

~ similar to 2606.02342· 18 results

cs.CRcs.ETcs.LGRecentApr 30, 2026

Selfie-Capture Dynamics as an Auxiliary Signal Against Deepfakes and Injection Attacks for Mobile Identity Verification

Erkka Rantahalvari, Olli Silvén, Zinelabidine Boulkenafet, Constantino Álvarez Casado

The paper demonstrates that passive motion traces recorded during a mobile selfie capture can serve as a measurable, low-friction auxiliary signal for enhancing both spoof screening and user identity…

View →

cs.CLcs.AIcs.LGRecentMay 27, 2026

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

Sachin Kumar

This paper systematically diagnoses the failure modes of linear deception probes in LLMs, finding that while single-direction probes are insufficient, multi-dimensional probes can recover robust detec…

View →

cs.CVcs.CRcs.LGRecentApr 30, 2026

GAFSV-Net: A Vision Framework for Online Signature Verification

Himanshu Singhal, Suresh Sundaram

GAFSV-Net introduces a novel 2D vision framework by encoding temporal signature data into a six-channel Gramian Angular Field image, significantly improving online signature verification accuracy over…

View →

cs.CRcs.AIcs.RORecentMay 18, 2026

Not What You Asked For: Typographic Attacks in Household Robot Manipulation

Ali Iranmanesh, Peng Liu

This paper demonstrates that typographic attacks pose a significant, measurable, and physically consequential threat to household robot manipulation systems by causing the robot to grasp and transport…

View →

cs.CVcs.AIcs.LGRecentJun 1, 2026

A Structured Benchmark for Text-Guided Anomaly Detection: When Language Stops Conditioning the Decision

Stefano Samele, Eugenio Lomurno, Teodora Jovanovic, Sanjay Shivakumar Manohar +2 more

The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…

View →

cs.CVcs.AIRecentJun 1, 2026

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events

Xiaolin Liu, Yilun Zhu, Xiangyu Zhao, Xuehui Wang +8 more

The paper introduces Moment-Video, a new benchmark that diagnoses the ability of video MLLMs to understand brief, critical visual events, revealing that current models struggle significantly with temp…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Jailbreaking Multimodal Large Language Models using Multi-Clip Video

Choongwon Kang, Seungjong Sun, Hyunmin Jun, Jang Hyun Kim

The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…

View →

cs.CLcs.AIcs.LGRecentJun 4, 2026

Operation-Guided Progressive Human-to-AI Text Transformation Benchmark for Multi-Granularity AI-Text Detection

Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Tianjun Yao +8 more

The paper introduces OpAI-Bench, a novel benchmark designed to study how AI authorship signals evolve and accumulate during the progressive co-editing process between humans and AI.

View →

cs.CVcs.CRRecentMay 17, 2026

Deepfake Detection in Social Media: A Temporal Artifact Analysis Using 3D Convolutional Neural Networks

Mohammadreza Rashidi, Raja Hashim Ali, Sami Ur Rahman

This paper proposes a 3D CNN detector that leverages temporal artifacts to accurately identify high-quality deepfake videos, demonstrating robust detection even after social media re-encoding.

View →

cs.CVcs.AIRecentJun 1, 2026

Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection

Atmika Bhardwaj, Silvia Vock, Nico Steckhan

The paper demonstrates that using synthetic hand images containing accessories, generated via inpainting, significantly improves the robustness of hand detectors for safety-critical applications by cl…

View →

cs.CVcs.AIcs.CRRecentApr 17, 2026

NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition

Junguang Yao, Wenye Liu, Stjepan Picek, Yue Zheng

NeuroLip proposes an event-based spatiotemporal framework for visual speaker recognition that achieves robust cross-scene generalization by capturing fine-grained lip dynamics, outperforming existing…

View →

cs.CVcs.AIcs.CLRecentJun 1, 2026

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis

Catyana Heyne, Jürgen Frikel, Filippo Riccio

The paper systematically compares multimodal transformer and LLM approaches for document type classification, finding that specialized multimodal Transformers outperform LLM-based models, especially w…

View →

cs.CVcs.CRRecentMar 17, 2026

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

Viraj Panchal, Tanmay Talsaniya, Parag Patel, Meet Patel

KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.

View →

cs.CVcs.AIRecentMay 28, 2026

Pocket-Dentist: On-Device Dental Image Understanding via Efficient Multimodal Large Language Models

Kai Bian, Xucheng Guo, Bin Chen, Lingyan Ruan +3 more

The paper introduces Pocket-Dentist, an efficiency-aware benchmark and model that demonstrates that compact, smaller Vision-Language Models (VLMs) can outperform larger models in accuracy while drasti…

View →

cs.HCcs.CRRecentApr 8, 2026

BioMoTouch: Touch-Based Behavioral Authentication via Biometric-Motion Interaction Modeling

Zijian Ling, Jianbang Chen, Hongwei Li, Hongda Zhai +5 more

BioMoTouch is a multi-modal touch authentication framework that jointly models physiological contact structures (from capacitive screens) and behavioral motion dynamics (from inertial sensors) to achi…

View →

cs.HCcs.AIcs.CVRecentJun 1, 2026

Quantitative Movement Testing: Measuring Patient Movements from a Single Smartphone Video

Pranav Mahajan, Amanda Wall, Eleonora Maria Camerone, Julie Stebbins +6 more

The paper developed and validated Quantitative Movement Testing (QMT), a computer vision pipeline that accurately extracts 3D kinematic biomarkers from standard smartphone videos, providing an objecti…

View →

cs.CRcs.CLcs.IRRecentApr 11, 2026

Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution

Robert Dilworth

This paper proposes using homoglyphic substitution, replacing characters with visually similar alternatives, as a method to degrade and prevent the extraction of personal information via adversarial s…

View →

cs.ETcs.AIcs.SDRecentMay 29, 2026

GaMi: Geometry-Agnostic Material Identification via Cross-Modal Subtractive Disentanglement

Zhiwei Chen, Yijie Li, Yimo Zhang, Shiyun Shao +8 more

GaMi is a multimodal material identification system that uses mmWave and acoustic sensing with a cross-modal subtractive disentanglement framework to achieve high accuracy (95.2%) for material identif…

View →