Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Test-Time Speculation

About

Speculative decoding accelerates LLM inference by using a fast draft model to generate tokens and a more accurate target model to verify them. Its performance depends on the $\textit{acceptance length}$, or number of draft tokens accepted by the target. Our studies show that the acceptance length of even state-of-the-art speculators, like DFlash, EAGLE-3 and PARD degrade with generation length, reaching values close to 1 (i.e. no speedup) within just a few thousand output tokens, making speculators ineffective for long-response tasks. Acceptance lengths decline because most speculators are trained offline on short sequences, but are forced to match the target model on much longer outputs at inference, well beyond their training distribution. To address this issue, we propose $\textit{Test-Time Speculation (TTS)}$, an online distillation approach that continuously adapts the speculator at test-time. TTS leverages the key insight that the token verification step already invokes the target model for each draft token, providing the training signal needed to adapt the draft at no additional cost. Treating the draft as the student and the target as a teacher, TTS adjusts the draft over several speculation rounds, with each update improving the draft's accuracy as generation proceeds. Our results across multiple models from the Qwen-3, Qwen-3.5, and Llama3.1 families show that TTS improves acceptance lengths over state-of-the-art speculators by up to $72\%$ and $41\%$ on average, with the benefits scaling with increased generation lengths.

Avinash Kumar, Sujay Sanghavi, Poulami Das• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningOlyBench Math
Acceptance Length4.7
12
Physics ReasoningOlyBench Phy
Acceptance Length4.5
12
STEM Theorem Question AnsweringTheoremQA
Acceptance Length4.4
12
Code GenerationLiveCodeBench
Acceptance Length4.7
6
Code GenerationLiveCodeBench
Acceptance Length2.1
6
Graduate-level Question AnsweringGPQA
Acceptance Length2
6
Mathematical ReasoningAIME 2025
Acceptance Length4.9
6
Mathematical ReasoningMATH 500
Acceptance Length4.5
6
Science Question AnsweringGPQA
Acceptance Length5.2
6
Mathematical ReasoningAIME 2024
Acceptance Length4.6
6
Showing 10 of 11 rows

Other info

Follow for update