Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

About

We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding. Speculative Decoding has two innovations: Spec-Drafter -- an independent model specially optimized for efficient and accurate drafting -- and Spec-Verification -- a reliable method for verifying the drafted tokens efficiently in the decoding paradigm. Experimental results on various seq2seq tasks including machine translation and abstractive summarization show our approach can achieve around $5\times$ speedup for the popular Transformer architectures with comparable generation quality to beam search decoding, refreshing the impression that the draft-then-verify paradigm introduces only $1.4\times$$\sim$$2\times$ speedup. In addition to the remarkable speedup, we also demonstrate 3 additional advantages of SpecDec, revealing its practical value for accelerating generative models in real-world applications. Our models and codes are available at https://github.com/hemingkx/SpecDec.

Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui• 2022

Related benchmarks

TaskDatasetResultRank
Machine TranslationWMT'16 Romanian-English (Ro-En) (test)--
21
Machine TranslationWMT Romanian-English 2016
BLEU35.03
14
Abstractive SummarizationCNN/DailyMail (test)
ROUGE-143.11
8
Behavioral ConsistencyCNN/DailyMail
Relative BLEU86.52
8
Machine TranslationWMT English-German 2014
Relative Decoding Speed5.1
7
Machine TranslationWMT German-English 2014
Relative Decoding Speed5.5
7
Machine TranslationWMT English-Romanian (EN-RO) '16
Relative Decoding Speed4.6
7
Machine TranslationWMT English-Romanian (EN-RO) 2016
BLEU Score35.45
7
Language ModelingWikiText-103
Latency (ms/token)21.8
5
Abstractive SummarizationXsum
ROUGE-138.6
5
Showing 10 of 11 rows

Other info

Follow for update