Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification

About

Reasoning models have achieved remarkable performance on tasks like math and logical reasoning thanks to their ability to search during reasoning. However, they still suffer from overthinking, often performing unnecessary reasoning steps even after reaching the correct answer. This raises the question: can models evaluate the correctness of their intermediate answers during reasoning? In this work, we study whether reasoning models encode information about answer correctness through probing the model's hidden states. The resulting probe can verify intermediate answers with high accuracy and produces highly calibrated scores. Additionally, we find models' hidden states encode correctness of future answers, enabling early prediction of the correctness before the intermediate answer is fully formulated. We then use the probe as a verifier to decide whether to exit reasoning at intermediate answers during inference, reducing the number of inference tokens by 24\% without compromising performance. These findings confirm that reasoning models do encode a notion of correctness yet fail to exploit it, revealing substantial untapped potential to enhance their efficiency.

Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li, He He• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH	Accuracy34.02	882
Mathematical Reasoning	AIME	AIME Accuracy63.37	288
Mathematical Problem Solving	AIME	AIME Score1.17e+3	52
Answer Verification	MATH	AUROC0.879	43
Mathematical Reasoning	Omni-MATH	ECE0.1104	28
Mathematical Reasoning	MATH (test)	Latency (s)88.5	26
Mathematical Reasoning	AIME	ECE8.61	23
Answer Verification	AMC12	AUROC86.21	22
Mathematical Reasoning	AMC12	Expected Calibration Error (ECE)0.1064	22
Mathematical Reasoning	MATH	ECE7.26	17

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord