Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

About

Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly revisiting the same high-probability prefixes and duplicate completions. We propose Distinct Leaf Enumeration (DLE), a deterministic decoding method that treats truncated sampling as traversal of a pruned decoding tree and systematically enumerates distinct leaves instead of sampling with replacement. This strategy improves inference efficiency in two ways. Algorithmically, it increases coverage of the truncated search space under a fixed budget by exploring previously unvisited high-probability branches. Systemically, it reuses shared prefixes and reduces redundant token generation. Empirically, DLE explores higher-quality reasoning traces than stochastic self-consistency, yielding better performance on math, coding, and general reasoning tasks.

Xueyan Li, Johannes Zenn, Ekaterina Fadeeva, Guinan Su, Mrinmaya Sachan, Jonas Geiping• 2026

Related benchmarks

Task	Dataset	Result
General Knowledge	MMLU-Pro	maj@4 Accuracy35.88	21
Multi-task Language Understanding	MMLU-Pro	Accuracy (maj@2)54.88	18
Reasoning	GSM8K	Accuracy (maj@4)41.17	12
Code Generation	HumanEval	Pass@259.15	9
Mathematical Reasoning	GSM8K	Accuracy (maj@2)64.52	9
Mathematical Reasoning	GSM8K	Accuracy (maj@2)33.21	9
Multiple-choice Question Answering	MMLU-Pro	Accuracy (maj@2)46.89	9

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord