Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods

About

Large language models (LLMs) have achieved significant performance gains via scaling up model sizes and/or data. However, recent evidence suggests diminishing returns from such approaches, motivating scaling the computation spent at inference time. Existing inference-time scaling methods, usually with reward models, cast the task as a search problem, which tends to be vulnerable to reward hacking as a consequence of approximation errors in reward models. In this paper, we instead cast inference-time scaling as a probabilistic inference task and leverage sampling-based techniques to explore the typical set of the state distribution of a state-space model with an approximate likelihood, rather than optimize for its mode directly. We propose a novel inference-time scaling approach by adapting particle-based Monte Carlo methods to this task. Our empirical evaluation demonstrates that our methods have a 4-16x better scaling rate over our deterministic search counterparts on various challenging mathematical reasoning tasks. Using our approach, we show that Qwen2.5-Math-1.5B-Instruct can surpass GPT-4o accuracy in only 4 rollouts, while Qwen2.5-Math-7B-Instruct scales to o1 level accuracy in only 32 rollouts. Our work not only presents an effective method to inference-time scaling, but also connects the rich literature in probabilistic inference with inference-time scaling of LLMs to develop more robust algorithms in future work. Code, videos, and further information available at https://probabilistic-inference-scaling.github.io.

Isha Puri, Shivchander Sudalairaj, Guangxuan Xu, Kai Xu, Akash Srivastava• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Pass@1 Accuracy10
165
Mathematical ReasoningMATH 500
Top-1 Accuracy70.45
112
Mathematical ReasoningOmni-MATH
Accuracy10.25
93
MathematicsAIME 2024
AIME 2024 Score (%)26.06
31
MathAIME 2025
Top-1 Score21.61
26
Mathematical Problem SolvingAIME 2025
Top-1 Accuracy (%)20.25
26
Mathematical Problem SolvingAIME 2024
Top-1 Accuracy20.45
26
Mathematical ReasoningGSM8K 128 samples
Top-1 Accuracy96.2
12
Mathematical ReasoningMATH500 random subset of 128 samples
Top-1 Accuracy70.31
12
Mathematical ReasoningDEEPMATH 128 samples
Top-1 Accuracy34.37
12
Showing 10 of 12 rows

Other info

Follow for update