Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Uncertainty-Aware Budget Allocation for Adaptive Test-Time Reasoning

About

Sampling multiple responses improves language model reasoning, but uniform compute allocation is inefficient: easy questions are over-sampled while hard questions remain under-explored. We propose Uncertainty-Aware Budget Allocation (UAB), a concave integer optimization framework that reallocates a fixed sampling budget based on per-question uncertainty estimated at no additional inference cost. In Phase 1, every question receives one generation; its average negative log-likelihood (ANLL), extracted directly from output log-probabilities, serves as a difficulty signal while the generation contributes to the final vote. In Phase 2, the remaining budget is allocated by a marginal-greedy algorithm that solves a concave coverage-maximization surrogate exactly: uncertain questions receive more sampling budget while confident questions receive fewer additional samples. Evaluated on six open-weight and black-box models spanning 1.5B to 27B parameters and five reasoning benchmarks covering math, logic, and preference tasks, UAB outperforms baselines by up to +3% in average accuracy and up to +5% on individual benchmarks, with the largest gains in low-resource settings, requiring no auxiliary model or additional LLM call. Code is publicly available at https://github.com/manhitv/UAB.

Manh Nguyen, Sunil Gupta, Hung Le• 2026

Related benchmarks

TaskDatasetResultRank
Logical reasoningFormal Logic
Accuracy87.8
136
Mathematical ReasoningMATH 500
Accuracy75
116
Science Question AnsweringGPQA
Accuracy55.8
69
ReasoningDeepScaleR
Accuracy57.3
30
Preference ModelingHH-RLHF
Accuracy60.8
30
Showing 5 of 5 rows

Other info

Follow for update