Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dual-Dimensional Consistency: Balancing Budget and Quality in Adaptive Inference-Time Scaling

About

Large Language Models (LLMs) have demonstrated remarkable abilities in reasoning. However, maximizing their potential through inference-time scaling faces challenges in trade-off between sampling budget and reasoning quality. Current strategies remain inefficient as they typically treat sampling width and depth as orthogonal objectives, where width consensus methods risk reinforcing hallucinations, while depth pruning mechanisms prematurely truncate complex yet valid reasoning chains. Therefore, we propose Dual-Dimensional Consistency (DDC), a unified framework that bridges path quality with adaptive termination. By coupling Confidence-Weighted Bayesian protocol with a Trend-Aware Stratified Pruning, our method ensures that computational resources are concentrated on high quality reasoning paths, filtering hallucinations while accelerating consensus. Evaluations across five benchmarks demonstrate that this approach reduces token consumption by over 10 times while maintaining or exceeding the accuracy of strong baselines across various LLMs.

Rongman Xu, Yifei Li, Tianzhe Zhao, Yanrui Wu, Bo Li, Hang Yan• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 24
Accuracy93.3
318
Mathematical ReasoningAMC 23
Pass@1 Accuracy100
109
Science Question AnsweringGPQA Diamond
Accuracy69.6
59
ReasoningAverage (MATH500, AMC23, AIME24, AIME25, GPQA-d)
Accuracy87.2
25
Mathematical ReasoningAIME 25
Accuracy (AIME 25)83.3
25
Science ReasoningGPQA Diamond
Accuracy72.2
25
Mathematical ReasoningAIME 25
Accuracy83.3
6
Showing 7 of 7 rows

Other info

Follow for update