Attention Drift: What Autoregressive Speculative Decoding Models Learn

About

Speculative decoding accelerates LLM inference by drafting future tokens with a small model, but drafter models degrade sharply under template perturbation and long-context inputs. We identify a previously-unreported phenomenon we call \textbf{attention drift}: as the drafter generates successive tokens within a speculation chain, attention progressively moves from the prompt onto its own recently-generated tokens. We observe this across both \emph{EAGLE3} drafters and \emph{MTP heads}, suggesting drift is a property of drafter designs. We trace this to the un-normalized residual path between chain steps: the drafter's hidden state magnitude grows monotonically with chain depth, which exhibits dynamics consistent with additional pre-norm transformer layers stacked on the target rather than as a standalone autoregressive predictor. In order to limit the growth, we propose two architectural changes: Post-norm on the drafter hidden states and per-hidden-state RMSNorm after capturing target hidden states. Our interventions improve acceptance length over the current leading model, pre-norm EAGLE3, by up to $2\times$ under template perturbation, $1.18\times$ on long-context tasks, and $1.10\times$ on seven standard benchmarks spanning multi-turn chat, math, and coding. Our changes also allow shorter train-time-test depths to generalize over longer drafting sequences.

Do\u{g}a\c{c} Eldenk, Payal Mohapatra, Yigitcan Comlek, Kaan Oktay, Hongyang Zhang, Stephen Xia• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	--	600
Multi-turn conversation	MT-Bench	--	76
Instruction Following	Alpaca	Average Accepted Length3.57	51
Code Generation	LiveCodeBench	Mean Acceptance Length (τ)4.71	22
Scientific Question Answering	GPQA	Avg Response Length4.78	13
Multi-turn Chat Evaluation	MT-Bench	Acceptance Length3.61	8
Instruction Following	Alpaca	Acceptance Length3.59	6
Code Generation	HumanEval	Acceptance Length5.21	4
Code Generation	HumanEval	SGLang Acceptance Length4.49	4
Coding	LongBench Repobench (test)	Avg Accepted Draft Tokens/Round2.26	4

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord