Papers similar to 2606.02444

~ similar to 2606.02444· 20 results

cs.HCcs.AIcs.CLRecentMay 28, 2026

Inform, Coach, Relate, Listen: Auditing LLM Caregiving Support Roles

Drishti Goel, Agam Goyal, Veda Duddu, Olivia Pal +7 more

This study demonstrates that an LLM's assigned support role (e.g., Inform, Coach, Relate) significantly alters its safety profile and the types of risks it presents when assisting users in complex car…

View →

cs.HCcs.AIcs.CLRecentMay 28, 2026

LLUMI: Improving LLM Writing Assistance for Mental Health Support with Online Community Feedback

Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more

The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…

View →

cs.CLRecentMay 31, 2026

Lost in Delusion: Examining LLM Safety Under User Delusions and Distress

Andrew Aquilina, Chetna Nihalani, Vasudha Varadarajan, Nathan S. Fishbein +2 more

The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…

View →

cs.CLcs.AIcs.CYRecentMay 29, 2026

Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits

Soorya Ram Shimgekar, Agam Goyal, Amruta Parulekar, Joshua Chen +5 more

The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…

View →

cs.CLcs.LGRecentJun 1, 2026

Investigating and Alleviating Harm Amplification in LLM Interactions

Ruohao Guo, Wei Xu, Alan Ritter

This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…

View →

cs.HCcs.CRRecentMay 11, 2026

When Are LLM Inferences Acceptable? User Reactions and Control Preferences for Inferred Personal Information

Kyzyl Monteiro, Minjung Park, Alexander Ioffrida, Angelina Sanna +5 more

This study investigated user reactions to inferred personal information from their own ChatGPT histories, finding that acceptability is governed by context-sensitive norms regarding generation, retent…

View →

cs.CLRecentJun 1, 2026

Why Do Self-Harm Prediction Models Struggle to Generalise? Lexical and Semantic Variations in Emergency Department Triage Notes

Liuliu Chen, Mike Conway, Jo Robinson, Vlada Rozova

This paper investigates why self-harm prediction models struggle to generalize across different hospitals, finding that variations in local lexical expression and feature importance are the primary ca…

View →

cs.AIRecentMay 28, 2026

Think Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text Generation

Kai-Chen Cheng, Haejun Han, David Q. Sun

The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…

View →

cs.CLcs.AIcs.HCRecentMay 28, 2026

EUDAIMONIA: Evaluating Undesirable Dynamics in AI

Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast +2 more

The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate…

View →

cs.CRcs.CYcs.LGRecentApr 11, 2026

"bot lane noob" Towards Deployment of NLP-based Toxicity Detectors in Video Games

Jonas Ave, Irdin Pekaric, Matthias Frohner, Giovanni Apruzzese

This paper addresses the lack of specialized NLP tools for detecting toxicity in real-time video game chat by creating a large, fine-grained dataset and developing a superior, domain-specific detector…

View →

cs.CYcs.CRRecentMay 6, 2026

An Evaluation of Chat Safety Moderations in Roblox

Priya Kaushik, Sonja Brown, Rakibul Hasan, Sazzadur Rahaman

This study evaluated Roblox's chat moderation system using a large corpus of 2 million messages, finding that numerous unsafe messages related to grooming, harassment, and self-harm continue to escape…

View →

cs.CRcs.AIRecentApr 11, 2026

Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

Souradip Nath, Chih-Yi Huang, Aditi Ganapathi, Kashyap Thimmaraju +2 more

Analyzing Reddit discussions, the paper finds that while security practitioners see LLMs as useful for boosting productivity, their adoption is constrained by concerns over reliability, verification,…

View →

cs.CRRecentApr 1, 2026

Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

Weidi Luo, Xiaofei Wen, Tenghao Huang, Hongyi Wang +4 more

The paper introduces FoodGuardBench, a comprehensive benchmark and a specialized guardrail model (FoodGuard-4B) to rigorously test and mitigate the severe food safety risks posed by large language mod…

View →

cs.CLcs.AIRecentJun 1, 2026

Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents

Aitor Arronte Alvarez, Naiyi Xie Fincham

This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…

View →

cs.AIcs.CYq-fin.RMRecentMay 27, 2026

The Ethics of LLM Sandbox and Persona Dynamics

Tim Gebbie, Stewart Gebbie

The paper argues that LLM guardrails and persona dynamics create an unethical 'reality gap' by laundering epistemic risk onto users, advocating for task-level causal requirements over response-level m…

View →

cs.CRRecentMay 6, 2026

Evaluating the Reliability of Multiple Large Language Models in Risk Assessment: A CIS Controls Based Approach

Gustavo Roberto Pinto, Arthur do Prado Labaki, Rodrigo Sanches Miani

The study compared the cybersecurity risk assessment capabilities of five popular large language models (LLMs) against human experts, finding that LLMs consistently underestimated risks and require ma…

View →

cs.CVcs.AIcs.CRRecentMar 25, 2026

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin +4 more

The paper analyzes that while multimodal large language models (MLLMs) offer superior semantic understanding for image generation, this enhanced capability significantly increases safety risks, partic…

View →

cs.CRcs.AIcs.CLRecentMay 1, 2026

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

Alfredo Madrid-García, Miguel Rujas

This paper demonstrates that patient-facing RAG chatbots frequently expose sensitive system configurations, knowledge base details, and conversation history through client-server communication, posing…

View →

cs.AIcs.CRRecentMay 18, 2026

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction

Jiahe Guo, Xiangran Guo, Jiaxuan Chen, Weixiang Zhao +5 more

This paper introduces the concept of Safety Geometry Collapse, demonstrating that multimodal inputs degrade the safety separation of LLMs, and proposes ReGap, a training-free method that adaptively co…

View →

cs.CLcs.AIcs.LGRecentMay 30, 2026

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…

View →