~ similar to 2605.27911· 20 results
Liuliu Chen, Elise R. Carrotte, Brian E. Chapman, Jo Robinson +1 more
The paper introduces FigSIM, the first fine-grained dataset for analyzing suicide memes, which is used to benchmark models across tasks like suicide severity and figurative language detection.
Zhongjie Ba, Liang Yi, Peng Cheng, Qingcao Li +2 more
The paper introduces ToxiAlert-Bench, a large-scale audio dataset that uniquely annotates both textual and paralinguistic sources of toxicity, and proposes a dual-head neural network that significantl…
The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…
This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…
This study evaluated Roblox's chat moderation system using a large corpus of 2 million messages, finding that numerous unsafe messages related to grooming, harassment, and self-harm continue to escape…
Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie +2 more
This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the like…
This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…
Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more
This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.
Darlan Noetzold, Anubis Graciela De Moraes Rossetto, Juan Francisco De Paz Santana, Valderi Reis Quietinho Leithardt
The paper proposes a unified, microservices-based platform that integrates endpoint telemetry and predictive NLP models to provide real-time, correlated alerting for security risks and hate speech.
Michael S. Lee, Yash Maurya, Drew Rein, Bert Herring +12 more
The paper introduces ROK-FORTRESS, a novel bilingual, culturally adversarial benchmark that demonstrates that LLM safety behavior in high-stakes scenarios is significantly shaped by the interaction be…
The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…
Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast +2 more
The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate…
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
ContextualJailbreak introduces an evolutionary red-teaming strategy that performs automated search over simulated multi-turn primed dialogues, achieving high jailbreak rates across multiple state-of-t…
The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…
The paper introduces the Sovereign Context Protocol (SCP), an open-source, attribution-aware data access layer designed to standardize how Large Language Models (LLMs) connect to and track usage of hu…
The paper demonstrates that generative AI can automate and scale highly personalized, context-aware spear-phishing attacks using only public social media data, resulting in messages that are significa…
The paper introduces TeleHunt, a comprehensive framework and tool that systematically evaluates various strategies for efficiently discovering cybercriminal communities operating on Telegram.