~ similar to 2605.04491v2· 20 results
This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…
The paper analyzes Codes of Conduct (CoCs) for online video games using a novel pipeline, finding that most multiplayer games lack CoCs despite safety needs, and that CoCs often lack specificity regar…
Xiangyu Wang, Zhiwei Yu, Chengze Du, Dingchang Wang +2 more
The paper introduces SuiChat-CN, a novel Chinese group-chat benchmark for contextual suicide risk assessment, demonstrating that multi-party conversational context is crucial for accurate detection.
Taro Tsuchiya, Haoxiang Yu, Tina Marjanov, Alice Hutchings +2 more
This paper provides a large-scale characterization of Telegram bots, revealing that while they serve useful functions like crowdsourcing, they are also extensively used for malicious activities such a…
KidsNanny is a two-stage multimodal content moderation pipeline that achieves high accuracy and efficiency in detecting child safety threats, particularly excelling in text-embedded content.
Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie +2 more
This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the like…
This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…
The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…
This paper introduces ComicJailbreak, a new benchmark demonstrating that structured visual narratives can effectively jailbreak Multimodal Large Language Models (MLLMs), requiring new safety alignment…
This paper demonstrates that patient-facing RAG chatbots frequently expose sensitive system configurations, knowledge base details, and conversation history through client-server communication, posing…
The paper investigates how different methods of jailbreaking large language models (SFT, RLVR, and abliteration) lead to vastly different behavioral and mechanistic failures, even when all methods ach…
Roy Ricaldi, Maximilian Schafer, Philipp Zech, Luca Allodi +2 more
This study provides a longitudinal analysis of dark web content, revealing that cybercrime discussions are dominated by a few persistent core topics rather than rapidly shifting themes.
Chang Jin, An Wang, Zeming Wei, Kai Wang +6 more
The paper introduces SkillSafetyBench, a comprehensive benchmark demonstrating that agent safety failures often stem from adversarial influences within reusable skills and execution environments, rath…
This study proposes a negotiation framework, using composite indices (RBTI and CATI), to explain how youth navigate competing privacy pressures when using smart voice assistants, finding that high usa…
Yukun Jiang, Yage Zhang, Michael Backes, Xinyue Shen +1 more
This paper presents HarmfulSkillBench, a large-scale benchmark demonstrating that even small percentages of publicly available skills can be misused for harmful actions, significantly lowering LLM ref…
The paper introduces TraceSafe-Bench, a comprehensive benchmark, and finds that securing LLM agents requires jointly optimizing for structural reasoning and safety alignment to mitigate risks during m…
Darlan Noetzold, Anubis Graciela De Moraes Rossetto, Juan Francisco De Paz Santana, Valderi Reis Quietinho Leithardt
The paper proposes a unified, microservices-based platform that integrates endpoint telemetry and predictive NLP models to provide real-time, correlated alerting for security risks and hate speech.
Zheng-Xin Yong, Parv Mahajan, Andy Wang, Ida Caspary +11 more
The paper conducts a preliminary safety evaluation of the open-weight LLM Kimi K2.5, finding that while it is highly capable, it exhibits concerning dual-use risks, particularly regarding CBRNE misuse…
This paper demonstrates a novel, multi-stage privacy-leakage attack chain against black-box chatbot agents by combining indirect prompt injection with web-tool invocation, showing that such attacks ar…
The paper introduces TeleHunt, a comprehensive framework and tool that systematically evaluates various strategies for efficiently discovering cybercriminal communities operating on Telegram.