SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs

About

Large language models (LLMs) are increasingly deployed across diverse domains, yet they are prone to generating factually incorrect outputs - commonly known as "hallucinations." Among existing mitigation strategies, uncertainty-based methods are particularly attractive due to their ease of implementation, independence from external data, and compatibility with standard LLMs. In this work, we introduce a novel and scalable uncertainty-based semantic clustering framework for automated hallucination detection. Our approach leverages sentence embeddings and hierarchical clustering alongside a newly proposed inconsistency measure, SINdex, to yield more homogeneous clusters and more accurate detection of hallucination phenomena across various LLMs. Evaluations on prominent open- and closed-book QA datasets demonstrate that our method achieves AUROC improvements of up to 9.3% over state-of-the-art techniques. Extensive ablation studies further validate the effectiveness of each component in our framework.

Samir Abdaljalil, Hasan Kurban, Parichit Sharma, Erchin Serpedin, Rachad Atat• 2025

Related benchmarks

Task	Dataset	Result
Hallucination Detection	TriviaQA	--	625
Hallucination Detection	NQ	AUC0.783	199
Hallucination Detection	BioASQ	AUROC0.8137	104
Hallucination Detection	SQuAD	AUC78.26	40

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord