Test-Time Speculation

About

Speculative decoding accelerates LLM inference by using a fast draft model to generate tokens and a more accurate target model to verify them. Its performance depends on the $\textit{acceptance length}$, or number of draft tokens accepted by the target. Our studies show that the acceptance length of even state-of-the-art speculators, like DFlash, EAGLE-3 and PARD degrade with generation length, reaching values close to 1 (i.e. no speedup) within just a few thousand output tokens, making speculators ineffective for long-response tasks. Acceptance lengths decline because most speculators are trained offline on short sequences, but are forced to match the target model on much longer outputs at inference, well beyond their training distribution. To address this issue, we propose $\textit{Test-Time Speculation (TTS)}$, an online distillation approach that continuously adapts the speculator at test-time. TTS leverages the key insight that the token verification step already invokes the target model for each draft token, providing the training signal needed to adapt the draft at no additional cost. Treating the draft as the student and the target as a teacher, TTS adjusts the draft over several speculation rounds, with each update improving the draft's accuracy as generation proceeds. Our results across multiple models from the Qwen-3, Qwen-3.5, and Llama3.1 families show that TTS improves acceptance lengths over state-of-the-art speculators by up to $72\%$ and $41\%$ on average, with the benefits scaling with increased generation lengths.

Avinash Kumar, Sujay Sanghavi, Poulami Das• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	OlyBench Math	Acceptance Length4.7	12
Physics Reasoning	OlyBench Phy	Acceptance Length4.5	12
STEM Theorem Question Answering	TheoremQA	Acceptance Length4.4	12
Code Generation	LiveCodeBench	Acceptance Length4.7	6
Code Generation	LiveCodeBench	Acceptance Length2.1	6
Graduate-level Question Answering	GPQA	Acceptance Length2	6
Mathematical Reasoning	AIME 2025	Acceptance Length4.9	6
Mathematical Reasoning	MATH 500	Acceptance Length4.5	6
Science Question Answering	GPQA	Acceptance Length5.2	6
Mathematical Reasoning	AIME 2024	Acceptance Length4.6	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord