Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DRIFT: Detecting Representational Inconsistencies for Factual Truthfulness

About

LLMs often produce fluent but incorrect answers, yet detecting such hallucinations typically requires multiple sampling passes or post-hoc verification, adding significant latency and cost. We hypothesize that intermediate layers encode confidence signals that are lost in the final output layer, and propose a lightweight probe to read these signals directly from hidden states. The probe adds less than 0.1\% computational overhead and can run fully in parallel with generation, enabling hallucination detection before the answer is produced. Building on this, we develop an LLM router that answers confident queries immediately while delegating uncertain ones to stronger models. Despite its simplicity, our method achieves SOTA AUROC on 10 out of 12 settings across four QA benchmarks and three LLM families, with gains of up to 13 points over prior methods, and generalizes across dataset shifts without retraining.

Rohan Bhatnagar, Youran Sun, Chi Andrew Zhang, Yixin Wen, Haizhao Yang• 2026

Related benchmarks

TaskDatasetResultRank
Hallucination DetectionTriviaQA
AUROC0.9395
265
Hallucination DetectionNQ-Open
AUROC0.8843
27
Hallucination DetectionMMLU-Pro
AUROC87.08
15
Hallucination DetectionWebQuestions
AUROC87.67
15
Showing 4 of 4 rows

Other info

Follow for update