Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling

About

Large Reasoning Models (LRMs) achieve strong performance on mathematical reasoning tasks but remain unreliable on challenging instances. Existing test-time scaling methods, such as repeated sampling, self-correction, and tree search, improve performance at the cost of increased computation, yet often exhibit diminishing returns on hard problems. We observe that output disagreement is strongly correlated with instance difficulty and prediction correctness, providing a useful signal for guiding instance-level strategy selection at test time. Based on this insight, we propose a training-free framework that formulates test-time scaling as an instance-level routing problem, rather than allocating more computation within a single strategy, dynamically selecting among different scaling strategies based on output disagreement. The framework applies lightweight resolution for consistent cases, majority voting for moderate disagreement, and rewriting-based reformulation for highly ambiguous instances. Experiments on seven mathematical benchmarks and three models show that our method improves accuracy by 3% - 7% while reducing sampling cost compared to existing approaches.

Zhimin Lin, Yixin Ji, Jinpeng Li, Yu Luo, Dong Li, Junhua Fang, Juntao Li, Min Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 24
Accuracy80
318
Code GenerationMBPP+
Accuracy73.5
236
Code GenerationHumanEval
Accuracy89
217
Mathematical ReasoningAMC23
PASS@1 Accuracy97.5
207
Mathematical ReasoningGSM8K--
204
Mathematical ReasoningOlympiad
Accuracy0.673
134
Mathematical ReasoningAIME25
Accuracy (ACC)70
32
Mathematical ReasoningMathematical Reasoning Suite (Math500, Gaokao En, Olympiad, GSM8K, AMC23, AIME25, AIME24)
Math500 Score92.8
18
Showing 8 of 8 rows

Other info

Follow for update