Where Does Reasoning Break? Step-Level Hallucination Detection via Hidden-State Transport Geometry

About

Large language models hallucinate during multi-step reasoning, but most existing detectors operate at the trace level: they assign one confidence score to a full output, fail to localize the first error, and often require multiple sampled completions. We frame hallucination instead as a property of the hidden-state trajectory produced during a single forward pass. Correct reasoning moves through a stable manifold of locally coherent transitions; a first error appears as a localized excursion in transport cost away from this manifold. We operationalize this view with a label-conditioned teacher that builds a trace-specific contrastive PCA lens and scores each step with seven geometric transition features, and a deployable BiLSTM student distilled from the teacher that operates on raw hidden states without inference-time labels. We prove that contrastive PCA is the optimal projection for a transport-separation objective between first error and correct states, and that single-pass first error localization holds whenever the first error creates a positive transport margin over preceding correct transitions. On ProcessBench, PRM800K, HaluEval, and TruthfulQA, both models outperform entropy-based, probing-based, and attention-based baselines in-domain; the teacher transfers stably across language models and datasets, while the student collapses under shift, a gap our distillation theory predicts. These results recast step-level hallucination detection as a problem of trajectory dynamics and identify the central obstacle to deployment: preserving the contrastive transport margin under distribution shift.

Tyler Alvarez, Ali Baheri• 2026

Related benchmarks

Task	Dataset	Result
Hallucination Detection	HaluEval	AUROC0.94	135
First-error detection	PROCESSBENCH	Accuracy68.7	6
First-error detection	PRM800K	Accuracy92.9	6
First-error detection	TruthfulQA	Accuracy96.8	6
Step-level hallucination detection	PROCESSBENCH	AUROC91	6
Step-level hallucination detection	PRM800K	AUROC99.8	6
Step-level hallucination detection	TruthfulQA	AUROC0.965	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord