Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing

About

Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and lack principled mechanisms to exploit global dynamics across parallel branches. We introduce 2D probing, an interface that exposes the width-depth dynamics of parallel thinking by periodically eliciting intermediate answers from all branches. Our analysis reveals three key insights: non-monotonic scaling across width-depth allocations, heterogeneous reasoning branch lengths, and early stabilization of global consensus. Guided by these insights, we introduce $\textbf{{Parallel-Probe}}$, a training-free controller designed to optimize online parallel thinking. Parallel-Probe employs consensus-based early stopping to regulate reasoning depth and deviation-based branch pruning to dynamically adjust width. Extensive experiments across three benchmarks and multiple models demonstrate that Parallel-Probe establishes a superior Pareto frontier for test-time scaling. Compared to standard majority voting, it reduces sequential tokens by up to $\textbf{35.8}$% and total token cost by over $\textbf{25.8}$% while maintaining competitive accuracy.

Tong Zheng, Chengsong Huang, Runpeng Dai, Yun He, Rui Liu, Xin Ni, Huiwen Bao, Kaishen Wang, Hongtu Zhu, Jiaxin Huang, Furong Huang, Heng Huang• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	HMMT25	Accuracy47.1	119
Mathematical Reasoning	AIME25	Accuracy76.9	41
Reasoning	AIME 25	Accuracy76.9	40
Mathematical Reasoning	AIME24 search	Accuracy81.5	24
Mathematical Reasoning	Avg. (held out)	Accuracy62	24
Reasoning	AIME24	Accuracy81.5	22
Reasoning	AIME24, AIME25, HMMT25 Average	Accuracy68.5	20

Showing 7 of 7 rows

Other info

GitHub

Follow for update

@wizwand_team Discord