DECK: A Consistency x Confidence Taxonomy of LLM Hallucinations
About
Existing hallucination taxonomies classify LLM errors by what is wrong with the output -- memorised misconceptions, reasoning failures, fluent fabrications. These taxonomies are useful for diagnosis but cannot answer a different question: which uncertainty scorer would have caught this error? We propose a complementary taxonomy that classifies errors by their detectability signature -- the signal a scorer family would read. The DECK taxonomy is a 2x2 partition along inter-sample consistency and token-level confidence into four behavioural regimes (Drift, Entrenched, Confabulation, Knotted), each mapping to a specific scorer family (or families) that can detect it: black-box consistency scorers have signal in D and C, white-box token-probability scorers have signal in K and C, and only an LLM-as-a-Judge with independent pretraining can detect E. Cell membership is operationalised by a Youden's J optimal split on each scorer axis. Across three models and four datasets we validate the taxonomy two ways: by analysing scorer-pair disagreement, and by checking that external labels (SelfAware unanswerable, HaluEval adversarial, PopQA entity popularity) land in the predicted DECK cells, with model-scale and content-specific secondary-cell refinements. We further identify a universal blind spot of output-level UQ: on knowledge-gap inputs where the generator emits confident, repeatable fabrications, every output-level family collapses by construction. A linear probe on Llama-3-8B's hidden states also collapses to chance, giving preliminary evidence that the failure may persist at the activation level; richer internal-state methods (UQ heads, information-theoretic estimators) remain to be tested.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hallucination Detection | TriviaQA | AUROC0.844 | 621 | |
| Hallucination Detection | HaluEval | AUROC0.704 | 131 | |
| Hallucination Detection | PopQA | AUC77 | 97 | |
| Hallucination Detection | TriviaQA Llama outputs (test) | -- | 15 | |
| Hallucination Detection | TriviaQA GPT outputs (test) | -- | 15 | |
| Hallucination Detection | TriviaQA Gemini outputs (test) | -- | 15 | |
| Hallucination Detection | HaluEval Llama outputs (test) | -- | 15 | |
| Hallucination Detection | HaluEval GPT outputs (test) | -- | 15 | |
| Hallucination Detection | HaluEval Gemini outputs (test) | -- | 15 | |
| Hallucination Detection | SelfAware Llama outputs (test) | -- | 15 |