Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts

About

Large Language Models (LLMs) excel at problem solving by generating chain of thoughts in natural language, but such verbal thinking is computationally costly and prone to overthinking. A recent work instead proposes a latent thinking architecture, Huginn-3.5B, which represents intermediate reasoning steps as a sequence of latent representations. However, latent thoughts lack interpretability and are difficult to supervise, raising concerns about the correctness and reliability of the model's latent thinking processes. In this paper, we provide a systematic study of how Huginn-3.5B thinks in the latent space and how external supervision signals can improve its latent thinking processes. We show that latent thoughts leading to correct versus incorrect answers exhibit highly distinguishable patterns, and that a latent classifier can reliably predict answer correctness directly from latent thoughts. Leveraging these insights, we propose Latent Thinking Optimization (LTO), a probabilistic algorithm that employs the latent classifier as a Latent Reward Model (LRM) to optimize the latent thinking processes. Extensive experiments across diverse reasoning tasks demonstrate that LRM is highly effective in detecting incorrect latent thinking patterns, and LTO can significantly improve the latent thinking processes. Furthermore, we show that LRM can generalize across diverse domains, and LTO can be seamlessly applied to general LLMs to improve their thinking processes. In contrast to verbal thinking, our method demonstrates that reward modeling and scaling test-time thinking with supervision can be performed directly in the latent space, highlighting its potential as a general, efficient, and domain-agnostic approach to improving the thinking processes of LLMs.

Hanwen Du, Yuxin Dong, Xia Ning• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	SVAMP	Accuracy79.1	403
Mathematical Reasoning	GSM8K	EM38.5	123
Code Generation	MBPP	Accuracy60	74
Mathematical Reasoning	GSM-Symbolic	GSM-Sym Accuracy82.1	73
Commonsense Reasoning	CommonsenseQA (CSQA)	Accuracy79	56
Code Reasoning	MBPP	Accuracy38.8	23
Mathematical Reasoning	GSM8K	Accuracy85.9	19
Code Generation	MBPP	Answer Correctness Rate29.9	8
Mathematical Reasoning	SVAMP	Answer Correctness Rate53.8	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord