~ similar to 2606.01301· 19 results
The paper introduces two methods, ermodel and ermodel, to significantly reduce hallucinations in clinical summarization by using hallucination detectors to guide iterative revisions and subsequently…
The paper introduces Responsible Contrastive Soft Prompting (RCSP), a parameter-efficient method using soft prompts to improve LLM reliability by simultaneously suppressing hallucinations, encouraging…
The paper introduces AMNESIA, the first large-scale, open-source benchmark for medical unlearning, demonstrating that current unlearning methods struggle to separate individual patient data from share…
This paper evaluates multiple LLMs (DeepSeek-R1, OpenBioLLM-Llama3, Qwen 3.5) for generating privacy-safe, high-quality synthetic mental health reports, demonstrating their effectiveness in expanding…
The paper proposes a memory-augmented, three-stage agentic pipeline that significantly reduces LLM hallucinations and improves operational efficiency by integrating semantic caching and advanced obser…
Jiwon Kim, Maya Ajit, Sherry Gong, Soorya Ram Shimgekar +3 more
The paper introduces LLUMI, an open-source framework that improves LLM writing assistance for mental health support using community feedback, demonstrating comparable performance to proprietary models…
Buyun Liang, Jinqi Luo, Liangzu Peng, Kwan Ho Ryan Chan +5 more
The paper introduces REALISTA, a novel latent-space adversarial attack framework that generates semantically realistic and coherent prompts to effectively induce hallucinations in large language model…
The paper introduces Citation Grounding (CG), a novel metric and framework, to systematically detect and reduce the hallucination of legal citations by verifying LLM outputs against a massive, structu…
Yusheng He, Jizhe Zhou, Xia Du, Zheng Lin +2 more
This paper systematically analyzes how different architectural components of Large Vision-Language Models (LVLMs) contribute to hallucination robustness, finding that joint enhancement of visual fidel…
Chenhao Fang, Jordi Mola, Mark Harman, Jason Nawrocki +9 more
The paper introduces a Hybrid Utility Minimum Bayes Risk (HUMBR) framework to significantly reduce hallucinations in high-stakes enterprise AI workflows, outperforming standard consistency methods.
The paper introduces CHARM, a novel framework that detects and mitigates cascading hallucination—the amplification of errors across multi-step agentic RAG pipelines—achieving an 82.1% reduction in err…
The paper introduces Adaptive Unlearning (AU), a post-deployment framework that surgically suppresses code-related hallucinations, significantly reducing the risk of package confusion attacks like slo…
The paper introduces the Calibrated Entropy Score (CES), a single-pass, black-box method that uses the distribution of token-level entropies to detect model hallucinations with high accuracy and forma…
The paper introduces the DECK taxonomy, a novel framework that classifies LLM hallucinations not by their content error, but by their detectability signature based on inter-sample consistency and toke…
The paper introduces MedCase-Structured, a synthetic, FHIR-formatted dataset designed to benchmark diagnostic reasoning in realistic EHR settings, showing that LLMs perform worse on structured data th…
The paper proposes 'Think Fast, Talk Smart,' a pipeline that separates deterministic data analysis from LLM generation, showing that offloading recurring, structured tasks to code significantly improv…
The paper proposes sampling directly from approximations of an LLM posterior, conditioned on high-scoring regions, to generate more coherent and useful text compared to existing post-hoc hallucination…
Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao +11 more
The paper introduces AutoMedBench, a novel workflow-aware benchmark that evaluates autonomous medical-AI agents across a five-stage research process, revealing that agents struggle most with validatio…
Hanze Li, Jinhao You, Yichen Guo, Kai Tang +2 more
The paper introduces DeLask, a novel decoding framework that dynamically skips or partially aggregates problematic decoder layers to significantly mitigate hallucinations in Large Language Models.