This study demonstrates that an LLM's assigned support role (e.g., Inform, Coach, Relate) significantly alters its safety profile and the types of risks it presents when assisting users in complex caregiving situations.
Language models are increasingly being deployed for conversational support in informal caregiving contexts, where interactions often extend beyond information-seeking: caregivers seek emotional reassurance, guidance, and help, while navigating uncertain, relationally complex care decisions. Yet most safety evaluations assess model behavior under generic prompts, leaving a critical question unexamined: does a model's safety profile change with its support role? We study this by operationalizing four expert-reviewed support roles grounded in social support theory: Inform, Coach, Relate, and Listen, and comparing them against two baseline controls: a basic prompting condition and a retrieval-augmented generation (RAG) condition. We evaluate across three language models (GPT-4o-mini, Llama-3.1-8B-Instruct, and MedGemma-1.5-4b-it) on 5,000 real-world queries from online Alzheimer's Disease and Related Dementias (ADRD) communities. We find that the LLM's support role systematically shapes both the prevalence and composition of interactional risks. Furthermore, a human evaluation study reveals a perceived quality--safety tension: more directive, information-oriented roles are rated as more helpful and trustworthy despite exhibiting elevated interactional risk profiles. We release ~90,000 support role-conditioned model responses with risk annotations as an ecologically grounded resource for research on safer LLM-mediated conversational support.
ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations
The paper proposes ESC-Skills, a skill-centric framework that discovers and self…
When Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection…
Persona prompting does not universally improve LLM performance; instead, it syst…
NICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMs
The paper introduces NICE, a novel, theory-grounded diagnostic benchmark for ass…
When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models
The paper introduces 'brittle safety,' a failure mode where aligned language mod…
Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptatio…
The paper introduces a reliability-oriented framework, IndicBERT-HPA, for multil…
Gram: Assessing sabotage propensities via automated alignment auditing
The paper introduces Gram, an automated framework that assesses AI agent propens…
Configurable Reward Model for Balanced Safety Alignment
The paper introduces the Configurable Safety Reward Model (CSRM), a novel reward…
On the impact of retrieved content representations in RAG Pipelines
The paper systematically compares multiple content representations for RAG pipel…