~ similar to 2606.02523· 18 results
Xiangyu Wang, Zhiwei Yu, Chengze Du, Dingchang Wang +2 more
The paper introduces SuiChat-CN, a novel Chinese group-chat benchmark for contextual suicide risk assessment, demonstrating that multi-party conversational context is crucial for accurate detection.
The paper introduces FBHM, a new benchmark for hateful memes, and proposes LSV, a steering vector method that significantly improves VLM performance by addressing the generalization gap.
This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…
This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more
The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…
Zhongjie Ba, Liang Yi, Peng Cheng, Qingcao Li +2 more
The paper introduces ToxiAlert-Bench, a large-scale audio dataset that uniquely annotates both textual and paralinguistic sources of toxicity, and proposes a dual-head neural network that significantl…
This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…
This study evaluated Roblox's chat moderation system using a large corpus of 2 million messages, finding that numerous unsafe messages related to grooming, harassment, and self-harm continue to escape…
The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…
KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.
Anisha Saha, Varsha Suresh, Teodora Kamova, Sophia Wiedmann +2 more
The paper introduces MuPHI, a dataset and MuPHIRM, a reasoning-augmented training framework, to improve Vision-Language Models' ability to detect and reason about subtle, context-dependent multimodal…
Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali +2 more
This paper introduces a Hybrid Mixture-of-Experts (HybridMoE) framework and a specialized corpus (Varnika) to significantly improve language models' ability to understand and retain figurative, cultur…
Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more
This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.
The paper introduces a synthetic dataset of multi-round conversations to detect conversational smishing, finding that XGBoost with TF-IDF features achieved the best performance (72.5% accuracy).
Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more
The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…
This study compares various authorship attribution methods on Japanese web reviews, finding that while BERT fine-tuning performs best, TF-IDF+LR offers superior stability and efficiency for large-scal…
The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…