Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Breaking the Overscaling Curse: Thinking Parallelism Before Parallel Thinking

About

Parallel thinking enhances LLM reasoning by multi-path sampling and aggregation. In system-level evaluations, a global parallelism level N is allocated to all samples, typically set large to maximize overall dataset accuracy. However, due to sample heterogeneity, some samples can achieve comparable performance with a smaller N'< N, causing budget redundancy. This incompatibility between system-level efficacy and sample-level efficiency constitutes the overscaling curse. In this paper, we formalize and quantify the overscaling curse, showing its universality and severity in practice, and analyze its trigger mechanism. We then propose a lightweight method, T2, to break the overscaling curse, which utilizes latent representations to estimate the optimal parallelism level for each sample before decoding. Experiments show that T2 significantly reduces cost while maintaining comparable performance, enabling more efficient parallel thinking.

Yiming Wang, Zhuosheng Zhang, Rui Wang• 2026

Related benchmarks

TaskDatasetResultRank
General Knowledge ReasoningMMLU-Pro
Accuracy75.84
31
General Science Question AnsweringGPQA
Inference Latency (s)2.7
24
Mathematical ReasoningAMC
C_mem (Ratio)0.1
24
Mathematical ReasoningAIME24
C_mem Ratio22
24
Mathematical ReasoningMATH500
Inference Latency (s)1.2
24
Mathematical ReasoningAMC
Latency (s)1.7
24
Mathematical ReasoningAIME24
Latency (s)12.2
24
Mathematical ReasoningAIME25
Inference Latency (s)14.2
24
Multi-task Language UnderstandingMMLU-Pro
Latency (s)3.3
24
Mathematical ReasoningAIME25
C_mem Ratio0.17
24
Showing 10 of 12 rows

Other info

Follow for update