EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter Adaptation

About

Speculative decoding accelerates Large Language Model inference through draft-then-verify generation, yet lightweight draft models face coupled efficiency and quality limitations: large-vocabulary output projection is costly, while limited draft capacity and static parameters reduce acceptance under specialized or shifting inputs. Vocabulary pruning lowers projection cost, but static variants miss locally important long-tail tokens, while dynamic variants remain sensitive to preset selection policies and budgets. Moreover, limited draft capacity can leave the draft distribution misaligned even when the target token is covered. Online alignment improves draft quality, but full-parameter updates introduce substantial memory and latency overhead. We introduce EvoSpec, which jointly adapts the active vocabulary and lightweight draft parameters from verification feedback. EvoSpec asynchronously retrieves semantic and statistical token neighbors and performs curriculum-weighted online LoRA alignment while preserving exact target-model verification. On Qwen3-8B/EAGLE-2, EvoSpec reaches a $2.18\times$ speedup over vanilla decoding and a $1.20\times$ gain over EAGLE-2, while improving specialized-domain coverage and using $27\%$ less auxiliary GPU adaptation memory than full-parameter online adaptation.

Shuyu Zhang, Lingfeng Pan, Qicheng Wang, Yaqi Shi, Yueyang Tan, Ruyu Yan, Jiaqi Chen, Lixing Du, Lu Wang• 2026

Related benchmarks

Task	Dataset	Result
Speculative Decoding	Spec-Bench	MT Score3.31	57
Speculative Decoding	HumanEval	--	52
Speculative Decoding	Code	Throughput (tokens/s)138.7	22
Speculative Decoding	Law	Throughput (tokens/s)132.7	22
Speculative Decoding	Med	Throughput (tokens/s)128.5	22
Speculative Decoding Inference	Pile of Law	Inference Speed (tokens/s)181.9	12
Speculative Decoding Inference	PubMedQA	Throughput (tokens/s)182.2	12
Speculative Decoding Inference	Specialized Datasets Aggregate	Average Speed (tokens/s)172.7	12
Speculative Decoding	Average Code, Law, Med	Throughput (tokens/s)133.3	11
Speculative Decoding	Specialized Domains Average	Throughput (tokens/s)114.3	11

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord