Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding
About
Standard autoregressive decoding in large language models (LLMs) is inherently short-sighted, often failing to find globally optimal reasoning paths due to its token-by-token generation process. While inference-time strategies like foresight sampling attempt to mitigate this by simulating future steps, they typically rely on ad-hoc heuristics for valuing paths and pruning the search space. This paper introduces Martingale Foresight Sampling (MFS), a principled framework that reformulates LLM decoding as a problem of identifying an optimal stochastic process. By modeling the quality of a reasoning path as a stochastic process, we leverage Martingale theory to design a theoretically-grounded algorithm. Our approach replaces heuristic mechanisms with principles from probability theory: step valuation is derived from the Doob Decomposition Theorem to measure a path's predictable advantage, path selection uses Optional Stopping Theory for principled pruning of suboptimal candidates, and an adaptive stopping rule based on the Martingale Convergence Theorem terminates exploration once a path's quality has provably converged. Experiments on six reasoning benchmarks demonstrate that MFS surpasses state-of-the-art methods in accuracy while significantly improving computational efficiency. Code will be released at https://github.com/miraclehetech/EACL2026-Martingale-Foresight-Sampling.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Logical reasoning | ReClor (test) | Accuracy63.78 | 87 | |
| Reasoning | ARC Challenge | Accuracy85.05 | 70 | |
| Arithmetic Reasoning | GSM8K | Pass@187.64 | 14 | |
| Common Sense Reasoning | ARC Challenge | Pass@186.73 | 14 | |
| Graduate-level Factual Reasoning | GPQA | Pass@134.9 | 14 | |
| Logical reasoning | LogiQA | Pass@1 Accuracy48.61 | 14 | |
| Mathematical Reasoning | MATH 500 | Pass@138.2 | 14 |