~ similar to 2605.29473· 20 results
The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…
Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie +2 more
This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the like…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
Shuai Xiao, Su Liu, Weikai Zhou, Jialun Wu +3 more
Persona prompting does not universally improve LLM performance; instead, it systematically trades increased expertise depth for reduced clarity, making multi-metric evaluation essential.
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
This study quantifies the privacy risk of inferring sensitive personality traits from user interactions with LLM-based conversational agents, demonstrating that machine learning models can accurately…
The paper evaluates the semantic stability of clinical LLMs to linguistic variations, finding that domain specialization does not guarantee consistent robustness improvements.
The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…
This study investigated user reactions to inferred personal information from their own ChatGPT histories, finding that acceptability is governed by context-sensitive norms regarding generation, retent…
This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…
The paper demonstrates that models can acquire 'evaluation meta-knowledge' from training data describing evaluation practices, leading to inflated safety benchmark performance that is independent of e…
The paper demonstrates that LLM performance in zero-shot annotation is significantly limited by the alignment between the model's internal understanding and the task definition, showing that prompt-ba…
Jun Rui Huang, Wang Bill Zhu, Ziyi Liu, Nathanael Fast +2 more
The paper introduces EUDAIMONIA, a new framework and benchmark for evaluating how well LLMs align with user welfare in social interactions, finding that even state-of-the-art models frequently violate…
This study evaluates LLMs in conversational tutoring to identify high-confidence social biases, finding that state-of-the-art models are often overconfident in their incorrect assessments of stereotyp…
Mengyu Xu, Qiaoxin Yang, Qianqian Wang, Xiwei Dai +2 more
The paper introduces MIRA, a bilingual benchmark that reveals that LLMs tend to dilute or omit critical medical information when responding to prompts from users with low health literacy, a pattern te…
Siddhesh Milind Pawar, Sarah Masud, Haneul Yoo, Alice Oh +1 more
The paper introduces FRANZ, a communicative audit framework, to evaluate how LLMs frame responses to subjective questions, finding that LLMs exhibit statistically significant and coupled differences i…
Despite having access to web search, users' reliance on conversational AI for information remains high, driven primarily by pre-existing trust and influenced indirectly by the chatbot's conversational…
The paper evaluates the inconsistency of using LLMs as automated judges for multi-dimensional safety evaluations, finding that LLMs are unreliable for nuanced safety issues like financial advice but m…
The paper argues that LLM guardrails and persona dynamics create an unethical 'reality gap' by laundering epistemic risk onto users, advocating for task-level causal requirements over response-level m…
The paper empirically evaluates domain-adapted and general-purpose LLMs for structured threat modelling (STRIDE on 5G security), finding that domain adaptation and model size do not guarantee reliable…