Reward-Guided Speculative Decoding for Efficient LLM Reasoning

About

We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines a lightweight draft model with a more powerful target model, incorporating a controlled bias to prioritize high-reward outputs, in contrast to existing speculative decoding methods that enforce strict unbiasedness. RSD employs a process reward model to evaluate intermediate decoding steps and dynamically decide whether to invoke the target model, optimizing the trade-off between computational cost and output quality. We theoretically demonstrate that a threshold-based mixture strategy achieves an optimal balance between resource utilization and performance. Extensive evaluations on challenging reasoning benchmarks, including Olympiad-level tasks, show that RSD delivers significant efficiency gains against decoding with the target model only (up to 4.4x fewer FLOPs), while achieving significant better accuracy than parallel decoding method on average (up to +3.5). These results highlight RSD as a robust and cost-effective approach for deploying LLMs in resource-intensive scenarios. The code is available at https://github.com/BaohaoLiao/RSD.

Baohao Liao, Yuhui Xu, Hanze Dong, Junnan Li, Christof Monz, Silvio Savarese, Doyen Sahoo, Caiming Xiong• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	Accuracy87.1	895
Mathematical Reasoning	MATH 500	Accuracy (Acc)80.4	543
Multi-task Language Understanding	MMLU	MMLU Accuracy69.6	442
Multitask Language Understanding	MMLU	Accuracy52.9	263
Mathematical Reasoning	OlympiadBench	Accuracy41.7	213
Mathematical Reasoning	AMC 23	Accuracy67.5	198
Grade School Math Reasoning	GSM8K	Accuracy (GSM8K)91.6	138
Mathematical Reasoning	Minerva	--	138
Mathematical Reasoning	Olympiad	Accuracy46.5	137
Mathematical Reasoning	AIME 24	AIME 24 Accuracy16.67	84

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord