Papers similar to 2605.04491v2

~ similar to 2605.04491v2· 20 results

cs.CRcs.CYcs.LGRecentApr 11, 2026

"bot lane noob" Towards Deployment of NLP-based Toxicity Detectors in Video Games

Jonas Ave, Irdin Pekaric, Matthias Frohner, Giovanni Apruzzese

This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…

View →

cs.CRcs.HCRecentMay 14, 2026

Analyzing Codes of Conduct for Online Safety in Video Games at Scale

Jiuming Jiang, Shidong Pan, Daniel W Woods, Jingjie Li

The paper analyzes Codes of Conduct (CoCs) for online video games using a novel pipeline, finding that most multiplayer games lack CoCs despite safety needs, and that CoCs often lack specificity regar…

View →

cs.AIRecentMay 27, 2026

SuiChat-CN: Benchmarking Contextual Suicide Risk Assessment in Chinese Group Chats

Xiangyu Wang, Zhiwei Yu, Chengze Du, Dingchang Wang +2 more

The paper introduces SuiChat-CN, a novel Chinese group-chat benchmark for contextual suicide risk assessment, demonstrating that multi-party conversational context is crucial for accurate detection.

View →

cs.CRcs.CYRecentMar 25, 2026

A Large-Scale Study of Telegram Bots

Taro Tsuchiya, Haoxiang Yu, Tina Marjanov, Alice Hutchings +2 more

This paper provides a large-scale characterization of Telegram bots, revealing that while they serve useful functions like crowdsourcing, they are also extensively used for malicious activities such a…

View →

cs.CVcs.CRRecentMar 17, 2026

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

Viraj Panchal, Tanmay Talsaniya, Parag Patel, Meet Patel

KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.

View →

cs.AIcs.CLRecentJun 1, 2026

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie +2 more

This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the like…

View →

cs.CLcs.LGRecentJun 1, 2026

Investigating and Alleviating Harm Amplification in LLM Interactions

Ruohao Guo, Wei Xu, Alan Ritter

This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…

View →

cs.CLRecentMay 31, 2026

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

Andrew Aquilina, Chetna Nihalani, Vasudha Varadarajan, Nathan S. Fishbein +2 more

The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…

View →

cs.CRcs.AIcs.MMRecentMar 23, 2026

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models

Rui Yang Tan, Yujia Hu, Roy Ka-Wei Lee

This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…

View →

cs.CRcs.AIcs.CLRecentMay 1, 2026

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

Alfredo Madrid-García, Miguel Rujas

This paper demonstrates that patient-facing RAG chatbots frequently expose sensitive system configurations, knowledge base details, and conversation history through client-server communication, posing…

View →

cs.CRcs.AIcs.CLRecentApr 20, 2026

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj

The paper investigates how different methods of jailbreaking large language models (SFT, RLVR, and abliteration) lead to vastly different behavioral and mechanistic failures, even when all methods ach…

View →

cs.CRRecentMay 14, 2026

Topical Shifts in the Dark Web: A Longitudinal Analysis of Content from the Cybercrime Ecosystem

Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more

This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.

View →

cs.CRcs.AIcs.CLRecentMay 12, 2026

SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces

Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more

The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…

View →

cs.CRcs.AIcs.CYRecentApr 4, 2026

Negotiating Privacy with Smart Voice Assistants: Risk-Benefit and Control-Acceptance Tensions

Molly Campbell, Mohamad Sheikho Al Jasem, Ajay Kumar Shrestha

This study proposes a negotiation framework, using composite indices (RBTI and CATI), to explain how youth navigate competing privacy pressures when using smart voice assistants, finding that high usa…

View →

cs.CRcs.AIRecentApr 16, 2026

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

Yukun Jiang, Yage Zhang, Michael Backes, Xinyue Shen +1 more

This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM ref…

View →

cs.CRcs.AIcs.CLRecentApr 8, 2026

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen

The paper introduces TraceSafe-Bench, a comprehensive benchmark, and finds that securing LLM agents requires jointly optimizing for structural reasoning and safety alignment to mitigate risks during m…

View →

cs.CRRecentMay 12, 2026

A microservices-based endpoint monitoring platform with predictive NLP models for real-time security and hate-speech risk alerting

Darlan Noetzold, Anubis Graciela De Moraes Rossetto, Juan Francisco De Paz Santana, Valderi Reis Quietinho Leithardt

The paper proposes a unified, microservices-based platform that integrates endpoint telemetry and predictive NLP models to provide real-time, correlated alerting for security risks and hate speech.

View →

cs.CRcs.AIcs.CLRecentApr 3, 2026

An Independent Safety Evaluation of Kimi K2.5

Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more

The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…

View →

cs.CRcs.AIcs.HCRecentMay 18, 2026

An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments

Hongjang Yang, Hyunsik Na, Daeseon Choi

This paper demonstrates a novel, multi-stage privacy-leakage attack chain against black-box chatbot agents by combining indirect prompt injection with web-tool invocation, showing that such attacks ar…

View →

cs.CRRecentJun 3, 2026

TeleHunt: A Framework and Tool for Efficient Cybercriminal Community Discovery on Telegram

Roy Ricaldi, Victor Asanache, Luca Allodi

The paper introduces TeleHunt, a comprehensive framework and tool that systematically evaluates various strategies for efficiently discovering cybercriminal communities operating on Telegram.

View →