Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

About

Speculative decoding accelerates LLM inference by having a lightweight draft model propose speculative windows of candidate tokens for parallel verification by a larger target model. In practice, speculative efficiency is often bottlenecked by hard-to-draft positions, where an early mismatch truncates the accepted prefix and invalidates the rest of the speculative window. Most learning-based drafters are still optimized with token-level supervised objectives, even though speculative utility is inherently window-level and prefix-sensitive. We propose PPOW (Performance-Driven Policy Optimization with Adaptive Windowing), a reinforcement learning framework that shifts drafter optimization from token-level imitation to window-level optimization. PPOW combines a Cost-Aware Speedup Reward, a Distribution-Based Proximity Reward, and Adaptive Divergence-Aware Windowing, which prioritizes informative windows with high confidence-weighted draft-target divergence. PPOW achieves average acceptance lengths of 6.29-6.52 and speedups of 3.39-4.36$\times$ across multiple model families and benchmarks under a unified decoding protocol. These results show that performance-driven window-level optimization is a practical approach to improving speculative decoding efficiency.

Jie Jiang, Xing Sun, Ruotian Chen, Jianan Su, Kaixin Shen• 2026

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval
Speedup Factor4.87
147
General speculative decoding performanceMean (MT-Bench, HumanEval, GSM8K)
Average Acceptance Length (τ)6.52
112
Code GenerationHumanEval
Avg Acceptance Length (τ)7.23
20
Mathematical ReasoningGSM8K
Average Acceptance Length (τ)6.97
20
Multi-turn dialogueMT-Bench
Acceptance Length (τ)5.78
20
SummarizationX-SUM
Average Acceptance Length (τ)5.13
3
Machine TranslationWMT14
Average Acceptance Length (tau)2.97
3
Showing 7 of 7 rows

Other info

Follow for update