DECK: A Consistency x Confidence Taxonomy of LLM Hallucinations

About

Existing hallucination taxonomies classify LLM errors by what is wrong with the output -- memorised misconceptions, reasoning failures, fluent fabrications. These taxonomies are useful for diagnosis but cannot answer a different question: which uncertainty scorer would have caught this error? We propose a complementary taxonomy that classifies errors by their detectability signature -- the signal a scorer family would read. The DECK taxonomy is a 2x2 partition along inter-sample consistency and token-level confidence into four behavioural regimes (Drift, Entrenched, Confabulation, Knotted), each mapping to a specific scorer family (or families) that can detect it: black-box consistency scorers have signal in D and C, white-box token-probability scorers have signal in K and C, and only an LLM-as-a-Judge with independent pretraining can detect E. Cell membership is operationalised by a Youden's J optimal split on each scorer axis. Across three models and four datasets we validate the taxonomy two ways: by analysing scorer-pair disagreement, and by checking that external labels (SelfAware unanswerable, HaluEval adversarial, PopQA entity popularity) land in the predicted DECK cells, with model-scale and content-specific secondary-cell refinements. We further identify a universal blind spot of output-level UQ: on knowledge-gap inputs where the generator emits confident, repeatable fabrications, every output-level family collapses by construction. A linear probe on Llama-3-8B's hidden states also collapses to chance, giving preliminary evidence that the failure may persist at the activation level; richer internal-state methods (UQ heads, information-theoretic estimators) remain to be tested.

Mohit Singh Chauhan• 2026

Related benchmarks

Task	Dataset	Result
Hallucination Detection	TriviaQA	AUROC0.844	625
Hallucination Detection	HaluEval	AUROC0.704	135
Hallucination Detection	PopQA	AUC77	97
Hallucination Detection	TriviaQA Llama outputs (test)	--	15
Hallucination Detection	TriviaQA GPT outputs (test)	--	15
Hallucination Detection	TriviaQA Gemini outputs (test)	--	15
Hallucination Detection	HaluEval Llama outputs (test)	--	15
Hallucination Detection	HaluEval GPT outputs (test)	--	15
Hallucination Detection	HaluEval Gemini outputs (test)	--	15
Hallucination Detection	SelfAware Llama outputs (test)	--	15

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord