Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Block Verification Accelerates Speculative Decoding

About

Speculative decoding is an effective method for lossless acceleration of large language models during inference. It uses a fast model to draft a block of tokens which are then verified in parallel by the target model, and provides a guarantee that the output is distributed identically to a sample from the target model. In prior works, draft verification is performed independently token-by-token. Surprisingly, we show that this approach is not optimal. We propose Block Verification, a simple draft verification algorithm that verifies the entire block jointly and provides additional wall-clock speedup. We prove that the proposed mechanism is optimal in the expected number of tokens produced each iteration and specifically is never worse than the standard token-level verification. Empirically, block verification provides modest but consistent wall-clock speedups over the standard token verification algorithm of 5%-8% in a range of tasks and datasets. Given that block verification does not increase code complexity, maintains the strong lossless guarantee of the standard speculative decoding verification algorithm, cannot deteriorate performance, and, in fact, consistently improves it, it can be used as a good default in speculative decoding implementations.

Ziteng Sun, Uri Mendlovic, Yaniv Leviathan, Asaf Aharoni, Jae Hun Ro, Ahmad Beirami, Ananda Theertha Suresh• 2024

Related benchmarks

TaskDatasetResultRank
Multilingual Mathematical ReasoningMGSM (test)--
109
Instruction FollowingAlpaca (test)
SR Score1.41
21
Language ModelingLM1B (test)
Block Efficiency8.42
15
Python ProgrammingHumanEval (test)
BE Score9.6
10
Speculative DecodingHumanEval (test)
BE7.59
10
Speculative DecodingGSM8K (test)
BE Score7.52
10
Speculative DecodingMGSM (test)
BE6.89
10
Speculative DecodingLM1B (test)
BE7.52
10
Speculative DecodingAlpaca (test)
BE Score7.2
10
grade-school mathGSM8K (test)
BE Score5.99
10
Showing 10 of 10 rows

Other info

Follow for update