Efficient Reasoning via Thought Compression for Language Segmentation

About

Chain-of-thought (CoT) reasoning has significantly improved the performance of large multimodal models in language-guided segmentation, yet its prohibitive computational cost, stemming from generating verbose rationales, limits real-world applicability. We introduce WISE (Wisdom from Internal Self-Exploration), a novel paradigm for efficient reasoning guided by the principle of \textit{thinking twice -- once for learning, once for speed}. WISE trains a model to generate a structured sequence: a concise rationale, the final answer, and then a detailed explanation. By placing the concise rationale first, our method leverages autoregressive conditioning to enforce that the concise rationale acts as a sufficient summary for generating the detailed explanation. This structure is reinforced by a self-distillation objective that jointly rewards semantic fidelity and conciseness, compelling the model to internalize its detailed reasoning into a compact form. At inference, the detailed explanation is omitted. To address the resulting conditional distribution shift, our inference strategy, WISE-S, employs a simple prompting technique that injects a brevity-focused instruction into the user's query. This final adjustment facilitates the robust activation of the learned concise policy, unlocking the full benefits of our framework. Extensive experiments show that WISE-S achieves state-of-the-art zero-shot performance on the ReasonSeg benchmark with 58.3 cIoU, while reducing the average reasoning length by nearly \textbf{5$\times$} -- from 112 to just 23 tokens. Code is available at \href{https://github.com/mrazhou/WISE}{WISE}.

Qing Zhou, Shiyu Zhang, Yuyu Jia, Junyu Gao, Weiping Ni, Junzheng Wu, Qi Wang• 2026

Related benchmarks

Task	Dataset	Result
Referring Expression Segmentation	RefCOCO (testA)	cIoU79.1	315
Referring Expression Segmentation	RefCOCO+ (testA)	cIoU75	288
Referring Expression Segmentation	RefCOCOg (test)	cIoU72.1	166
Reasoning Segmentation	ReasonSeg zero-shot (val)	gIoU63.5	11
Reasoning Segmentation	ReasonSeg zero-shot (test)	gIoU60.3	10

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord