Papers similar to 2606.02523

~ similar to 2606.02523· 18 results

cs.AIRecentMay 27, 2026

SuiChat-CN: Benchmarking Contextual Suicide Risk Assessment in Chinese Group Chats

Xiangyu Wang, Zhiwei Yu, Chengze Du, Dingchang Wang +2 more

The paper introduces SuiChat-CN, a novel Chinese group-chat benchmark for contextual suicide risk assessment, demonstrating that multi-party conversational context is crucial for accurate detection.

View →

cs.CLcs.AIcs.CVRecentMay 29, 2026

FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection

Paramananda Bhaskar, Naquee Rizwan, Daksh Jogchand, Saurabh Kumar Pandey +1 more

The paper introduces FBHM, a new benchmark for hateful memes, and proposes LSV, a steering vector method that significantly improves VLM performance by addressing the generalization gap.

View →

cs.CLRecentJun 1, 2026

Why Do Self-Harm Prediction Models Struggle to Generalise? Lexical and Semantic Variations in Emergency Department Triage Notes

Liuliu Chen, Mike Conway, Jo Robinson, Vlada Rozova

This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…

View →

cs.CRcs.AIcs.MMRecentMar 23, 2026

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…

View →

cs.HCcs.AIcs.CLRecentMay 28, 2026

LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback

Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more

The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…

View →

cs.CVcs.AIcs.CRRecentApr 12, 2026

Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang +6 more

The paper proposes an end-to-end forensic pipeline using steganographic attribution and multimodal harm detection to reliably trace and attribute harmful misuse of AI-generated imagery on social platf…

View →

cs.SDcs.AIcs.CRRecentMay 15, 2026

Beyond Content: A Comprehensive Speech Toxicity Dataset and Detection Framework Incorporating Paralinguistic Cues

Zhongjie Ba, Liang Yi, Peng Cheng, Qingcao Li +2 more

The paper introduces ToxiAlert-Bench, a large-scale audio dataset that uniquely annotates both textual and paralinguistic sources of toxicity, and proposes a dual-head neural network that significantl…

View →

cs.CRcs.CYcs.LGRecentApr 11, 2026

"bot lane noob" Towards Deployment of NLP-based Toxicity Detectors in Video Games

Jonas Ave, Irdin Pekaric, Matthias Frohner, Giovanni Apruzzese

This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…

View →

cs.CYcs.CRRecentMay 6, 2026

An Evaluation of Chat Safety Moderations in Roblox

Priya Kaushik, Sonja Brown, Rakibul Hasan, Sazzadur Rahaman

This study evaluated Roblox's chat moderation system using a large corpus of 2 million messages, finding that numerous unsafe messages related to grooming, harassment, and self-harm continue to escape…

View →

cs.CLRecentMay 31, 2026

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

Andrew Aquilina, Chetna Nihalani, Vasudha Varadarajan, Nathan S. Fishbein +2 more

The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…

View →

cs.CVcs.CRRecentMar 17, 2026

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

Viraj Panchal, Tanmay Talsaniya, Parag Patel, Meet Patel

KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.

View →

cs.AIcs.CLcs.LGRecentMay 28, 2026

MuPHI: Learning Implicit Multimodal Harm Reasoning via Semantically Grounded Reward Optimization

Anisha Saha, Varsha Suresh, Teodora Kamova, Sophia Wiedmann +2 more

The paper introduces MuPHI, a dataset and MuPHIRM, a reasoning-augmented training framework, to improve Vision-Language Models' ability to detect and reason about subtle, context-dependent multimodal…

View →

cs.CLRecentJun 1, 2026

When Meaning Travels: A Granular Lens on Hybrid-MoE's Role in Idiomatic Understanding for Language Models

Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali +2 more

This paper introduces a Hybrid Mixture-of-Experts (HybridMoE) framework and a specialized corpus (Varnika) to significantly improve language models' ability to understand and retain figurative, cultur…

View →

cs.CRRecentMay 14, 2026

Topical Shifts in the Dark Web: A Longitudinal Analysis of Content from the Cybercrime Ecosystem

Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more

This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.

View →

cs.CRRecentApr 13, 2026

A Synthetic Conversational Smishing Dataset for Social Engineering Detection

Carl Lochstampfor, Ayan Roy

The paper introduces a synthetic dataset of multi-round conversations to detect conversational smishing, finding that XGBoost with TF-IDF features achieved the best performance (72.5% accuracy).

View →

cs.CVcs.AIcs.CRRecentMar 25, 2026

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more

The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…

View →

cs.CLcs.CRRecentMar 24, 2026

Foundational Study on Authorship Attribution of Japanese Web Reviews for Actor Analysis

Hiroshi Matsubara, Shingo Matsugaya, Taichi Aoki, Masaki Hashimoto

This study compares various authorship attribution methods on Japanese web reviews, finding that while BERT fine-tuning performs best, TF-IDF+LR offers superior stability and efficiency for large-scal…

View →

cs.CLcs.AIcs.CYRecentMay 29, 2026

Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits

Soorya Ram Shimgekar, Agam Goyal, Amruta Parulekar, Joshua Chen +5 more

The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…

View →