ArXivCSExplorer
☆☆Bookmarks🏆RSSHow to UseFAQ
Built with and by Teycir Ben Soltane•
How to Use•FAQ•GitHub•arXiv.org•
Share:

~ similar to 2605.27911· 20 results

cs.CLcs.CVcs.CYRecentJun 1, 2026

FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes

Liuliu Chen, Elise R. Carrotte, Brian E. Chapman, Jo Robinson +1 more

The paper introduces FigSIM, the first fine-grained dataset for analyzing suicide memes, which is used to benchmark models across tasks like suicide severity and figurative language detection.

View →
cs.SDcs.AIcs.CRRecentMay 15, 2026

Beyond Content: A Comprehensive Speech Toxicity Dataset and Detection Framework Incorporating Paralinguistic Cues

Zhongjie Ba, Liang Yi, Peng Cheng, Qingcao Li +2 more

The paper introduces ToxiAlert-Bench, a large-scale audio dataset that uniquely annotates both textual and paralinguistic sources of toxicity, and proposes a dual-head neural network that significantl…

View →
cs.CLRecentMay 31, 2026

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

Andrew Aquilina, Chetna Nihalani, Vasudha Varadarajan, Nathan S. Fishbein +2 more

The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…

View →
cs.CRcs.CYcs.LGRecentApr 11, 2026

"bot lane noob" Towards Deployment of NLP-based Toxicity Detectors in Video Games

Jonas Ave, Irdin Pekaric, Matthias Frohner, Giovanni Apruzzese

This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…

View →
cs.HCcs.AIcs.CLRecentMay 28, 2026

LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback

Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more

The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…

View →
cs.CLRecentJun 1, 2026

Why Do Self-Harm Prediction Models Struggle to Generalise? Lexical and Semantic Variations in Emergency Department Triage Notes

Liuliu Chen, Mike Conway, Jo Robinson, Vlada Rozova

This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…

View →
cs.CYcs.CRRecentMay 6, 2026

An Evaluation of Chat Safety Moderations in Roblox

Priya Kaushik, Sonja Brown, Rakibul Hasan, Sazzadur Rahaman

This study evaluated Roblox's chat moderation system using a large corpus of 2 million messages, finding that numerous unsafe messages related to grooming, harassment, and self-harm continue to escape…

View →
cs.AIcs.CLRecentJun 1, 2026

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie +2 more

This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the like…

View →
cs.CLcs.LGRecentJun 1, 2026

Investigating and Alleviating Harm Amplification in LLM Interactions

Ruohao Guo, Wei Xu, Alan Ritter

This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…

View →
cs.CRRecentMay 14, 2026

Topical Shifts in the Dark Web: A Longitudinal Analysis of Content from the Cybercrime Ecosystem

Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more

This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.

View →
cs.CRRecentMay 12, 2026

A microservices-based endpoint monitoring platform with predictive NLP models for real-time security and hate-speech risk alerting

Darlan Noetzold, Anubis Graciela De Moraes Rossetto, Juan Francisco De Paz Santana, Valderi Reis Quietinho Leithardt

The paper proposes a unified, microservices-based platform that integrates endpoint telemetry and predictive NLP models to provide real-time, correlated alerting for security risks and hate speech.

View →
cs.CLcs.AIcs.CRRecentMay 13, 2026

ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety

Michael S. Lee, Yash Maurya, Drew Rein, Bert Herring +12 more

The paper introduces ROK-FORTRESS, a novel bilingual, culturally adversarial benchmark that demonstrates that LLM safety behavior in high-stakes scenarios is significantly shaped by the interaction be…

View →
cs.CLcs.AIcs.CYRecentMay 29, 2026

Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits

Soorya Ram Shimgekar, Agam Goyal, Amruta Parulekar, Joshua Chen +5 more

The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…

View →
cs.CLcs.AIcs.HCRecentMay 28, 2026

EUDAIMONIA: Evaluating Undesirable Dynamics in AI

Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast +2 more

The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate…

View →
cs.AIRecentMay 28, 2026

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Kai-Chen Cheng, Haejun Han, David Q. Sun

The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…

View →
cs.CLcs.CRRecentMay 4, 2026

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming

Mario Rodríguez Béjar, Francisco J. Cortés-Delgado, S. Braghin, Jose L. Hernández-Ramos

ContextualJailbreak introduces an evolutionary red-teaming strategy that performs automated search over simulated multi-turn primed dialogues, achieving high jailbreak rates across multiple state-of-t…

View →
cs.CRcs.SERecentMay 4, 2026

A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts

Richard J. Young, Gregory D. Moody

The paper introduces a validated, consensus-labeled prompt bank that separates requests for executable malicious code (weapons) from requests for general harmful security knowledge, providing a more g…

View →
cs.CRcs.AIcs.LGRecentMar 28, 2026

Sovereign Context Protocol: An Open Attribution Layer for Human-Generated Content in the Age of Large Language Models

Praneel Panchigar, Torlach Rush, Matthew Canabarro

The paper introduces the Sovereign Context Protocol (SCP), an open-source, attribution-aware data access layer designed to standardize how Large Language Models (LLMs) connect to and track usage of hu…

View →
cs.CRRecentMay 11, 2026

Context-Aware Spear Phishing: Generative AI-Enabled Attacks Against Individuals via Public Social Media Data

Elham Pourabbas Vafa, Sayak Saha Roy, Shirin Nilizadeh

The paper demonstrates that generative AI can automate and scale highly personalized, context-aware spear-phishing attacks using only public social media data, resulting in messages that are significa…

View →
cs.CRRecentJun 3, 2026

TeleHunt: A Framework and Tool for Efficient Cybercriminal Community Discovery on Telegram

Roy Ricaldi, Victor Asanache, Luca Allodi

The paper introduces TeleHunt, a comprehensive framework and tool that systematically evaluates various strategies for efficiently discovering cybercriminal communities operating on Telegram.

View →