Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Parallel Prefix Verification for Speculative Generation

About

We introduce PARSE (PArallel pRefix Speculative Engine), a speculative generation framework that accelerates large language model (LLM) inference by parallelizing prefix verification on a semantic level. Existing speculative decoding methods are fundamentally limited by token-level equivalence: the target model must verify each token, leading to short acceptance lengths and modest speedups. Moving to semantic or segment-level verification can substantially increase acceptance granularity, but prior approaches rely on sequential verification, introducing significant overhead and limiting practical gains. PARSE introduces parallel prefix verification, enabling semantic-level verification without sequential checks. Given a full draft from a draft model, the target model evaluates correctness across multiple prefixes in a single forward pass using a custom attention mask, directly identifying the maximal valid prefix. This eliminates sequential segment verification, and makes verification compute-efficient. PARSE is orthogonal to token-level speculative decoding and can be composed with it for additional gains. Across models and benchmarks, PARSE delivers $1.25\times$ to $4.3\times$ throughput gain over the target model, and $1.6\times$ to $4.5\times$ when composed with EAGLE-3, all with negligible accuracy degradation. This demonstrates parallel prefix verification as an effective, general approach to accelerating LLM inference.

Yuncheng Yao, Yuxuan Xia, Shengjie Wang, Danyang Zhuo• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH
Speedup2.83
46
Language UnderstandingMMLU-Pro--
20
Code GenerationMBPP+
TPS (tok/s)170.9
6
Code GenerationHumanEval+
Throughput (tok/s)193.1
4
DialogueMT-Bench
TPS (tok/s)194
4
Language ModelingMMLU
Throughput (tok/s)216.6
4
Language ModelingMMLU-Pro
Tokens Per Second (TPS)191.3
4
Mathematical ReasoningGSM8K
Throughput (tok/s)420.7
4
Science Question AnsweringGPQA
TPS (tok/s)151.7
4
Instruction FollowingMT-Bench
MT-Bench Score9.11
3
Showing 10 of 14 rows

Other info

Follow for update