Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling

About

Large Language Models (LLMs) have demonstrated remarkable abilities in reasoning. However, maximizing their potential through inference-time scaling faces challenges in trade-off between sampling budget and reasoning quality. Current strategies remain inefficient as they typically treat sampling width and depth as orthogonal objectives, where width consensus methods risk reinforcing hallucinations, while depth pruning mechanisms prematurely truncate complex yet valid reasoning chains. Therefore, we propose Dual-Dimensional Consistency (DDC), a unified framework that bridges path quality with adaptive termination. By coupling Confidence-Weighted Bayesian protocol with a Trend-Aware Stratified Pruning, our method ensures that computational resources are concentrated on high quality reasoning paths, filtering hallucinations while accelerating consensus. Evaluations across five benchmarks demonstrate that this approach reduces token consumption by over 10 times while maintaining or exceeding the accuracy of strong baselines across various LLMs.

Rongman Xu, Yifei Li, Tianzhe Zhao, Yanrui Wu, Bo Li, Hang Yan• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 24	Accuracy93.3	358
Mathematical Reasoning	AMC 23	Pass@1 Accuracy100	109
Science Question Answering	GPQA Diamond	Accuracy69.6	84
Reasoning	Average (MATH500, AMC23, AIME24, AIME25, GPQA-d)	Accuracy87.2	25
Mathematical Reasoning	AIME 25	Accuracy (AIME 25)83.3	25
Science Reasoning	GPQA Diamond	Accuracy72.2	25
Mathematical Reasoning	AIME 25	Accuracy83.3	6

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord