~ similar to 2606.01678· 20 results
Liuliu Chen, Gowri Rajaram, Eleanor Bailey, Katrina Witt +4 more
The paper introduces an evidence-augmented machine learning approach to improve self-harm surveillance by analyzing Emergency Department triage notes, achieving high and transferable performance acros…
The paper investigates apparent LLM triage failures and concludes that the errors originate in the output format and decision process, rather than a deficiency in the model's underlying clinical knowl…
Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie +2 more
This paper systematically evaluates how LLMs uncritically adapt to potentially dangerous user prompts related to eating disorders, finding that specific linguistic cues significantly increase the like…
The study demonstrates that LLMs exhibit significant, language-driven disparities in medical triage recommendations, recommending emergency care more frequently for English and Arabic prompts, even wh…
Baris Karacan, Vaibhav Bhargava, Barbara Di Eugenio, Natalie Parde +20 more
The paper introduces a supervised fine-tuning pipeline using large language models to accurately categorize sentence-level clinical provenance across multi-disciplinary hospital notes, demonstrating t…
The paper evaluates the semantic stability of clinical LLMs to linguistic variations, finding that domain specialization does not guarantee consistent robustness improvements.
Xiangyu Wang, Zhiwei Yu, Chengze Du, Dingchang Wang +2 more
The paper introduces SuiChat-CN, a novel Chinese group-chat benchmark for contextual suicide risk assessment, demonstrating that multi-party conversational context is crucial for accurate detection.
Liuliu Chen, Elise R. Carrotte, Brian E. Chapman, Jo Robinson +1 more
The paper introduces FigSIM, the first fine-grained dataset for analyzing suicide memes, which is used to benchmark models across tasks like suicide severity and figurative language detection.
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
The paper demonstrates that increasing the toxicity of prompts significantly degrades the factual reliability of LLMs, a degradation linked to the selective amplification of perturbation-sensitive nod…
This paper introduces a framework to audit source-dependence in multi-source RAG systems, demonstrating that disagreement across institutional sources is a common and critical failure mode that curren…
This paper introduces KliniskVestBERT, a suite of BERT models specialized by pre-training on a large, diverse corpus of real-world Norwegian clinical texts, demonstrating superior performance for clin…
The paper finds that while LLMs can detect distress regardless of delusional framing, they significantly fail to intervene safely when distress is intertwined with delusion, suggesting a critical reco…
The authors demonstrate that fine-tuning a two-stage retrieval system using synthetic data generated by large language models can significantly improve the performance of medical semantic search for c…
This paper shows that large language models can automate reproducibility assessments in the social and behavioral sciences.
This paper introduces HarmAmp, a new benchmark for multi-turn harm amplification, and proposes TrajSafe, a proactive monitoring system that significantly reduces harmfulness in LLM interactions while…
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
Zihang Fu, Fanxiao Li, Jianyang Gu, Haonan Wang +4 more
The paper introduces EvoNote, a self-evolving agentic framework that significantly improves the generation of evidence-grounded health community notes by utilizing an accumulated memory of past misinf…
This paper evaluates multiple LLMs (DeepSeek-R1, OpenBioLLM-Llama3, Qwen 3.5) for generating privacy-safe, high-quality synthetic mental health reports, demonstrating their effectiveness in expanding…
The paper introduces LinguIUTics, a system that significantly improves the classification of rare psychological defense mechanisms in conversational text by fine-tuning Qwen3-8B using specialized imba…