Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts

About

Large Language Models (LLMs) excel at problem solving by generating chain of thoughts in natural language, but such verbal thinking is computationally costly and prone to overthinking. A recent work instead proposes a latent thinking architecture, Huginn-3.5B, which represents intermediate reasoning steps as a sequence of latent representations. However, latent thoughts lack interpretability and are difficult to supervise, raising concerns about the correctness and reliability of the model's latent thinking processes. In this paper, we provide a systematic study of how Huginn-3.5B thinks in the latent space and how external supervision signals can improve its latent thinking processes. We show that latent thoughts leading to correct versus incorrect answers exhibit highly distinguishable patterns, and that a latent classifier can reliably predict answer correctness directly from latent thoughts. Leveraging these insights, we propose Latent Thinking Optimization (LTO), a probabilistic algorithm that employs the latent classifier as a Latent Reward Model (LRM) to optimize the latent thinking processes. Extensive experiments across diverse reasoning tasks demonstrate that LRM is highly effective in detecting incorrect latent thinking patterns, and LTO can significantly improve the latent thinking processes. Furthermore, we show that LRM can generalize across diverse domains, and LTO can be seamlessly applied to general LLMs to improve their thinking processes. In contrast to verbal thinking, our method demonstrates that reward modeling and scaling test-time thinking with supervision can be performed directly in the latent space, highlighting its potential as a general, efficient, and domain-agnostic approach to improving the thinking processes of LLMs.

Hanwen Du, Yuxin Dong, Xia Ning• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningSVAMP
Accuracy79.1
368
Mathematical ReasoningGSM8K
EM38.5
115
Mathematical ReasoningGSM-Symbolic
GSM-Sym Accuracy82.1
43
Commonsense ReasoningCommonsenseQA (CSQA)
Accuracy79
38
Code GenerationMBPP
Accuracy60
25
Code ReasoningMBPP
Accuracy38.8
23
Mathematical ReasoningGSM8K
Accuracy85.9
19
Code GenerationMBPP
Answer Correctness Rate29.9
8
Mathematical ReasoningSVAMP
Answer Correctness Rate53.8
8
Showing 9 of 9 rows

Other info

Follow for update