Uncertainty-Aware Budget Allocation for Adaptive Test-Time Reasoning

About

Sampling multiple responses improves language model reasoning, but uniform compute allocation is inefficient: easy questions are over-sampled while hard questions remain under-explored. We propose Uncertainty-Aware Budget Allocation (UAB), a concave integer optimization framework that reallocates a fixed sampling budget based on per-question uncertainty estimated at no additional inference cost. In Phase 1, every question receives one generation; its average negative log-likelihood (ANLL), extracted directly from output log-probabilities, serves as a difficulty signal while the generation contributes to the final vote. In Phase 2, the remaining budget is allocated by a marginal-greedy algorithm that solves a concave coverage-maximization surrogate exactly: uncertain questions receive more sampling budget while confident questions receive fewer additional samples. Evaluated on six open-weight and black-box models spanning 1.5B to 27B parameters and five reasoning benchmarks covering math, logic, and preference tasks, UAB outperforms baselines by up to +3% in average accuracy and up to +5% on individual benchmarks, with the largest gains in low-resource settings, requiring no auxiliary model or additional LLM call. Code is publicly available at https://github.com/manhitv/UAB.

Manh Nguyen, Sunil Gupta, Hung Le• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	Accuracy75	183
Logical reasoning	Formal Logic	Accuracy87.8	136
Science Question Answering	GPQA	Accuracy55.8	69
Reasoning	DeepScaleR	Accuracy57.3	30
Preference Modeling	HH-RLHF	Accuracy60.8	30

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord