~ similar to 2603.16181v1· 18 results
The paper introduces Multi-Clip Video (MCV) SafetyBench, a dataset demonstrating that the vulnerability of Multimodal Large Language Models (MLLMs) to jailbreaking increases with the diversity and num…
Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin +5 more
This paper introduces TwoHamsters, a new benchmark that rigorously tests Multi-Concept Compositional Unsafety (MCCU) in text-to-image models, demonstrating that current state-of-the-art models and saf…
Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more
The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…
This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…
The paper introduces a generalized zero-shot benchmark for facial age estimation that ethically excludes children's data during training, demonstrating that current state-of-the-art models fail signif…
Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu +3 more
The paper introduces PaSBench-Video, a comprehensive streaming video benchmark designed to rigorously test multimodal LLMs' ability to issue proactive safety warnings, finding that current models stru…
The paper introduces a structured benchmark (TGAD) showing that current text-guided anomaly detection models often overstate their language conditioning, as performance significantly degrades when the…
The paper introduces Opir, an efficient family of encoder-based multi-task guardrail models that provides competitive safety classification performance across various tasks while maintaining a signifi…
Jeyeon Eo, Joo Young Kim, Ran Ju, Minyoung Jung +1 more
BuddyBench introduces a novel, privacy-constrained multi-task benchmark that integrates longitudinal learning trajectories, standardized clinical assessments, and randomized trial data to advance pedi…
Yanyan Luo, Xue Han, Chunxu Zhao, Ruiqiao Bai +4 more
The paper introduces ChildEval, a large-scale benchmark designed to systematically evaluate how well large language models can infer and follow complex, child-specific preferences during long-context…
Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li +4 more
This paper introduces a novel framework, the Reasoning Safety Monitor, to detect and prevent logical inconsistencies and adversarial manipulations within the internal reasoning steps of large language…
GLiGuard introduces a compact, schema-conditioned bidirectional encoder that achieves state-of-the-art performance in LLM content moderation across multiple safety dimensions while drastically reducin…
Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more
This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…
Chengshuai Zhao, Zhen Tan, Dawei Li, Zhiyuan Yu +1 more
The paper proposes MMGuard, a proactive defense mechanism that injects unlearnable, human-imperceptible perturbations into multimodal data to prevent unauthorized fine-tuning of Large Vision-Language…
This paper presents an open-source computer vision pipeline for classifying vehicle body types from naturalistic roadway video.
Yuan Tian, Bing Hu, Fang Wu, Xiaomin Li +2 more
The paper investigates multimodal jailbreak robustness across various reasoning paradigms and finds that explicit image-tool interaction significantly improves safety by shifting the model's internal…
Yuan Tian, Bing Hu, Fang Wu, Xiaomin Li +2 more
The paper investigates multimodal jailbreak robustness across various reasoning paradigms and finds that explicit image-tool interaction significantly improves safety by guiding the model's internal r…
Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more
The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…