~ similar to 2604.27014v1· 20 results
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
The paper evaluates the semantic stability of clinical LLMs to linguistic variations, finding that domain specialization does not guarantee consistent robustness improvements.
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
The paper introduces Med-HEAL, a comprehensive framework and dataset for systematically identifying and mitigating hallucinations in medical LLMs, demonstrating that a self-critique pipeline significa…
This paper introduces KliniskVestBERT, a suite of BERT models specialized by pre-training on a large, diverse corpus of real-world Norwegian clinical texts, demonstrating superior performance for clin…
The authors demonstrate that fine-tuning a two-stage retrieval system using synthetic data generated by large language models can significantly improve the performance of medical semantic search for c…
Chao Ding, Mouxiao Bian, Tianbin Li, Minjia Yuan +11 more
The paper introduces SafeMed-R1, a clinically audited LLM that significantly improves safety and ethical alignment for medical applications, matching or exceeding resident performance on safety-critic…
LLM-FACETS introduces an open-source, privacy-preserving framework designed to enable non-technical domain experts and compliance officers to audit and evaluate the transparency and accountability of…
Yaxin Luo, Jiacheng Cui, Xiaohan Zhao, Xinyi Shang +4 more
The paper introduces LLMSurgeon, a framework that estimates the domain-level data mixture of a Large Language Model (LLM) using only generated text, thereby providing a post-hoc method to audit the mo…
This study benchmarks four local LLMs for natural-language-to-SQL querying in biopharma manufacturing, finding that general-purpose code-tuned models like Llama 3.1 8B and Qwen 2.5 Coder 7B outperform…
The paper proposes RPSG, a method that uses private seeds and differential privacy to generate highly realistic and strongly privacy-preserving synthetic data replicas of private text for LLMs.
This paper shows that large language models can automate reproducibility assessments in the social and behavioral sciences.
The paper introduces PrivacySIM, an evaluation suite that benchmarks how well LLMs can simulate individual user privacy decisions based on persona attributes, finding that while conditioning improves…
The paper addresses 'Template Collapse' in 3D CT report generation—where models generate generic reports—by proposing CLarGen, a decoupled framework that significantly improves clinical accuracy and d…
Michele Miranda, Xinlan Yan, Nishant Mishra, Rachel Murphy +3 more
This paper conducts the first comparative study of Differential Privacy (DP), Named Entity Recognition (NER), and Large Language Models (LLMs) for de-identifying Dutch clinical notes, finding that com…
The paper proposes a novel framework that enables multiple institutions to jointly train a synthetic genomic data generator without revealing their raw data, thereby facilitating large-scale, privacy-…
This paper introduces Swiss-Bench 003, an expanded evaluation framework assessing LLM reliability and adversarial security across eight dimensions using 808 Swiss-specific items, revealing that self-g…
Jiahao Chen, Qi Zhang, Ruixiao Lin, Chunyi Zhou +6 more
The paper introduces the PrivacyIceberg framework to systematically categorize and empirically demonstrate the high risk of automated, deep personal profiling using LLM agents, revealing a significant…
This paper demonstrates that patient-facing RAG chatbots frequently expose sensitive system configurations, knowledge base details, and conversation history through client-server communication, posing…
Mengyu Xu, Qiaoxin Yang, Qianqian Wang, Xiwei Dai +2 more
The paper introduces MIRA, a bilingual benchmark that reveals that LLMs tend to dilute or omit critical medical information when responding to prompts from users with low health literacy, a pattern te…