SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

About

Recent advances in inference-time compute have significantly improved performance on complex tasks by generating long chains of thought (CoTs) using Large Reasoning Models (LRMs). However, this improved accuracy comes at the cost of high inference latency due to the length of generated reasoning sequences and the autoregressive nature of decoding. Our key insight in tackling these overheads is that LRM inference, and the reasoning that it embeds, is highly tolerant of approximations: complex tasks are typically broken down into simpler steps, each of which brings utility based on the semantic insight it provides for downstream steps rather than the exact tokens it generates. Accordingly, we introduce SpecReason, a system that automatically accelerates LRM inference by using a lightweight model to (speculatively) carry out simpler intermediate reasoning steps and reserving the costly base model only to assess (and potentially correct) the speculated outputs. Importantly, SpecReason's focus on exploiting the semantic flexibility of thinking tokens in preserving final-answer accuracy is complementary to prior speculation techniques, most notably speculative decoding, which demands token-level equivalence at each step. Across a variety of reasoning benchmarks, SpecReason achieves $1.4-3.0\times$ speedup over vanilla LRM inference while improving accuracy by $0.4-9.0\%$. Compared to speculative decoding without SpecReason, their combination yields an additional $8.8-58.0\%$ latency reduction. We open-source SpecReason at https://github.com/ruipeterpan/specreason.

Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH500 (test)	Accuracy92	895
Object Hallucination	POPE Popular	--	372
Mathematical Reasoning	MATH 500	Pass@1 Rate78	236
Mathematical Reasoning	MATH 500	Accuracy73.6	221
General Reasoning	MMLU-Pro	Accuracy62.4	201
Object Hallucination Evaluation	POPE Adversarial	Accuracy84.62	159
Object Hallucination Evaluation	POPE (Random)	Accuracy90.27	152
High-Resolution Visual Perception	HR-Bench-8K	Accuracy81.02	63
Code Reasoning	HumanEval	HumanEval Score68	62
Multimodal Reasoning	RealworldQA	Accuracy69.86	40

Showing 10 of 38 rows

Other info

Follow for update

@wizwand_team Discord