Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Consistent Latent Reasoning: Long Latent Sequence Reasoning for Vision-Language Model

About

In language reasoning, longer chains of thought consistently yield better performance, which naturally suggests that visual latent reasoning may likewise benefit from longer latent sequences. However, we discover a counterintuitive phenomenon: the performance of existing latent visual reasoning methods systematically degrades as the latent sequence grows longer. We reveal the root cause: Information Gain Collapse -- autoregressive generation makes each step highly dependent on prior outputs, so subsequent tokens can barely introduce new information. We further identify that heavily pooled ($\geq 128\times$) image embeddings used as supervision targets provide no more signal than meaningless placeholders. Motivated by these insights, we propose SCOLAR (Self-COnsistent LAtent Reasoning), which introduces a lightweight detransformer that leverages the LLM's full-sequence hidden states to generate auxiliary visual tokens in a single shot, with each token independently anchored to the original visual space. Combined with three-stage SFT and ALPO reinforcement learning, SCOLAR extends acceptable latent CoT length by over $30\times$, achieves state-of-the-art among open-source models on real-world reasoning benchmarks (+14.12% over backbone), and demonstrates strong out-of-distribution generalization.

Chenfeng Wang, Wei He, Xuhan Zhu, Chunpeng Zhou, Qizhen Li, Song Yan, Yufei Zheng, Chengjun Yu, Fan Lu, Wei Zhai, Yang Cao, Pengfei Yu, Zheng-Jun Zha• 2026

Related benchmarks

TaskDatasetResultRank
High-resolution perceptionHR-Bench-4K
Overall Score75.5
103
Visual Perception and ReasoningV*Bench
Attribute Score86.09
49
Visual ReasoningVisualPuzzles OOD (test)
Overall Accuracy34.42
8
Multimodal Perception and ReasoningMME-RealWorld-Lite
Overall Score59.87
7
Fine-grained High-Resolution PerceptionHRBench8K
Overall Score67.63
7
Showing 5 of 5 rows

Other info

Follow for update