Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

About

Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A naive solution is to retrain draft models for every target model, which is costly and inefficient. To address this, we introduce a parameter- and data-efficient framework named Efficient Draft Adaptation, abbreviated as EDA, for efficiently adapting draft models. EDA introduces three innovations: (1) a decoupled architecture that utilizes shared and private components to model the shared and target-specific output distributions separately, enabling parameter-efficient adaptation by updating only the lightweight private component;(2) a data regeneration strategy that utilizes the fine-tuned target model to regenerate training data, thereby improving the alignment between training and speculative decoding, leading to higher average acceptance length;(3) a sample selection mechanism that prioritizes high-value data for efficient adaptation. Our experiments show that EDA effectively restores speculative performance on fine-tuned models, achieving superior average acceptance lengths with significantly reduced training costs compared to full retraining. Code is available at https://github.com/Lyn-Lucy/Efficient-Draft-Adaptation.

Luxi Lin, Zhihang Lin, Zhanpeng Zeng, Yuhao Chen, Qingyu Zhang, Jixiang Luo, Xuelong Li, Rongrong Ji• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Speed Up (x)3.06	246
Code Generation	HumanEval	Speedup Factor3.36	147
Code Generation	MBPP	Speedup3.18	79
Medical Question Answering	MedMCQA	Tau Correlation4.3	13
Mathematical Reasoning	MathQA	Average Acceptance Length τ5.16	12
Code Generation	APPS	Tau5.65	10
Code Generation	BigCodeBench	tau4.18	10
Mathematical Reasoning	AIME 2024	Average Acceptance Length (τ)5.41	10
Mathematical Reasoning	SVAMP	Average Acceptance Length4.96	10
Medical Question Answering	MedQA USMLE	Kendall's Tau (τ)4.34	10

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord