Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
About
We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-$n$ test-time scaling with a reward model $r(x,y)$ and speculative samples from a small auxiliary model $\pi_S(y\mid x)$. We provably approximate both the optimal tilted policy $\pi_{\beta,B}(y\mid x) \propto \pi_B(y\mid x)\exp(\beta\,r(x,y))$ of soft best-of-$n$ under the base model $\pi_B$, as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K) and across different model families, our method achieves higher accuracy than standard soft best-of-$n$ with $\pi_S$ and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-$n$ with $\pi_B$, while reducing end-to-end latency by up to $28\%$. The code is available at https://github.com/j-geuter/GSI .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH 500 | Accuracy (Acc)84.1 | 543 | |
| Multi-task Language Understanding | MMLU | MMLU Accuracy78.3 | 442 | |
| Multitask Language Understanding | MMLU | Accuracy54.1 | 263 | |
| Mathematical Reasoning | OlympiadBench | Accuracy42 | 213 | |
| Grade School Math Reasoning | GSM8K | Accuracy (GSM8K)94.3 | 138 | |
| Mathematical Reasoning | OlympiadBench | Accuracy40.8 | 82 | |
| General Reasoning | Average (MATH500, OlympiadBench, Minerva, MMLU, GSM8K) | Average Accuracy67.2 | 20 |