Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses

About

As LLM-powered chatbots are increasingly deployed in mental health services, detecting hallucinations and omissions has become critical for user safety. However, state-of-the-art LLM-as-a-judge methods often fail in high-risk healthcare contexts, where subtle errors can have serious consequences. We show that leading LLM judges achieve only 52% accuracy on mental health counseling data, with some hallucination detection approaches exhibiting near-zero recall. We identify the root cause as LLMs' inability to capture nuanced linguistic and therapeutic patterns recognized by domain experts. To address this, we propose a framework that integrates human expertise with LLMs to extract interpretable, domain-informed features across five analytical dimensions: logical consistency, entity verification, factual accuracy, linguistic uncertainty, and professional appropriateness. Experiments on a public mental health dataset and a new human-annotated dataset show that traditional machine learning models trained on these features achieve 0.717 F1 on our custom dataset and 0.849 F1 on a public benchmark for hallucination detection, with 0.59-0.64 F1 for omission detection across both datasets. Our results demonstrate that combining domain expertise with automated methods yields more reliable and transparent evaluation than black-box LLM judging in high-stakes mental health applications.

Khizar Hussain, Bradley A. Malin, Zhijun Yin, Susannah Leigh Rose, Murat Kantarcioglu• 2026

Related benchmarks

Task	Dataset	Result
Hallucination Detection	Custom Dataset	F1 Score72.9	15
Omission Detection	Custom Dataset	Accuracy64.5	7
Hallucination Detection	Kaggle Mental Health Dataset	F1-Score84.9	5
Omission Detection	Kaggle Mental Health Dataset	F1-Score59.1	5
Hallucination Detection	Kaggle Dataset	Accuracy85.4	4
Omission Detection	Kaggle Dataset	Accuracy61.4	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord