A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs

About

Large Language Models (LLMs) have the tendency to hallucinate, i.e., to sporadically generate false or fabricated information. This presents a major challenge, as hallucinations often appear highly convincing and users generally lack the tools to detect them. Uncertainty quantification (UQ) provides a framework for assessing the reliability of model outputs, aiding in the identification of potential hallucinations. In this work, we introduce pre-trained UQ heads: supervised auxiliary modules for LLMs that substantially enhance their ability to capture uncertainty compared to unsupervised UQ methods. Their strong performance stems from the powerful Transformer architecture in their design and informative features derived from LLM attention maps. Experimental evaluation shows that these heads are highly robust and achieve state-of-the-art performance in claim-level hallucination detection across both in-domain and out-of-domain prompts. Moreover, these modules demonstrate strong generalization to languages they were not explicitly trained on. We pre-train a collection of UQ heads for popular LLM series, including Mistral, Llama, and Gemma 2. We publicly release both the code and the pre-trained heads.

Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun, Ivan Tsvigun, Zhuohan Xie, Igor Kiselev, Nico Daheim, Caiqi Zhang, Artem Vazhentsev, Mrinmaya Sachan, Preslav Nakov, Timothy Baldwin• 2025

Related benchmarks

Task	Dataset	Result
Medical LLM Risk Triage	RETINA-SAFE Stage-1	Unsafe Recall96.12	60
Short-form generation	Short-form generation ID	PRR66	38
Long-form generation	Long-form generation ID	PRR0.31	38
Short-form generation	Short-form generation datasets 1D-SameTask - OOD	PRR0.24	24
Long-form generation	DiffTask OOD	PRR0.05	24
Recognized vs. Unrecognized Classification	PopVQA and iNaturalist Attribution Tree	Gemma83.1	24
Long-form generation	Long-form generation datasets 1D-SameTask - OOD	PRR0.05	24
Unknown entity vs. Visual evidence failure Classification	PopVQA and iNaturalist Attribution Tree	Gemma57.5	24
Short-form generation	Short-form generation datasets LOO - near OOD	PRR0.34	24
Long-form generation	Long-form generation datasets LOO - near OOD	PRR-0.02	24

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord