Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning

About

Recent advances in inference-time compute have significantly improved performance on complex tasks by generating long chains of thought (CoTs) using Large Reasoning Models (LRMs). However, this improved accuracy comes at the cost of high inference latency due to the length of generated reasoning sequences and the autoregressive nature of decoding. Our key insight in tackling these overheads is that LRM inference, and the reasoning that it embeds, is highly tolerant of approximations: complex tasks are typically broken down into simpler steps, each of which brings utility based on the semantic insight it provides for downstream steps rather than the exact tokens it generates. Accordingly, we introduce SpecReason, a system that automatically accelerates LRM inference by using a lightweight model to (speculatively) carry out simpler intermediate reasoning steps and reserving the costly base model only to assess (and potentially correct) the speculated outputs. Importantly, SpecReason's focus on exploiting the semantic flexibility of thinking tokens in preserving final-answer accuracy is complementary to prior speculation techniques, most notably speculative decoding, which demands token-level equivalence at each step. Across a variety of reasoning benchmarks, SpecReason achieves $1.4-3.0\times$ speedup over vanilla LRM inference while improving accuracy by $0.4-9.0\%$. Compared to speculative decoding without SpecReason, their combination yields an additional $8.8-58.0\%$ latency reduction. We open-source SpecReason at https://github.com/ruipeterpan/specreason.

Rui Pan, Yinwei Dai, Zhihao Zhang, Gabriele Oliaro, Zhihao Jia, Ravi Netravali• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500 (test)
Accuracy92
514
Object HallucinationPOPE Popular--
273
Mathematical ReasoningMATH 500
Pass@1 Rate78
76
Object Hallucination EvaluationPOPE Adversarial
Accuracy84.62
55
High-Resolution Visual PerceptionHR-Bench-8K
Accuracy81.02
24
Mathematical ReasoningMATH 500
Pass@1 Acc84
18
General MLLM EvaluationAverage V* HR-Bench POPE
Accuracy83.78
13
Visual GroundingV* Direct Attributes 52
Accuracy89.57
13
Object Hallucination EvaluationPOPE (Random)
Accuracy90.27
13
Visual GroundingV* Relative Position 52
Accuracy75
13
Showing 10 of 18 rows

Other info

Follow for update