Parallel Prefix Verification for Speculative Generation
About
We introduce PARSE (PArallel pRefix Speculative Engine), a speculative generation framework that accelerates large language model (LLM) inference by parallelizing prefix verification on a semantic level. Existing speculative decoding methods are fundamentally limited by token-level equivalence: the target model must verify each token, leading to short acceptance lengths and modest speedups. Moving to semantic or segment-level verification can substantially increase acceptance granularity, but prior approaches rely on sequential verification, introducing significant overhead and limiting practical gains. PARSE introduces parallel prefix verification, enabling semantic-level verification without sequential checks. Given a full draft from a draft model, the target model evaluates correctness across multiple prefixes in a single forward pass using a custom attention mask, directly identifying the maximal valid prefix. This eliminates sequential segment verification, and makes verification compute-efficient. PARSE is orthogonal to token-level speculative decoding and can be composed with it for additional gains. Across models and benchmarks, PARSE delivers $1.25\times$ to $4.3\times$ throughput gain over the target model, and $1.6\times$ to $4.5\times$ when composed with EAGLE-3, all with negligible accuracy degradation. This demonstrates parallel prefix verification as an effective, general approach to accelerating LLM inference.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH | Speedup2.83 | 46 | |
| Language Understanding | MMLU-Pro | -- | 20 | |
| Code Generation | MBPP+ | TPS (tok/s)170.9 | 6 | |
| Code Generation | HumanEval+ | Throughput (tok/s)193.1 | 4 | |
| Dialogue | MT-Bench | TPS (tok/s)194 | 4 | |
| Language Modeling | MMLU | Throughput (tok/s)216.6 | 4 | |
| Language Modeling | MMLU-Pro | Tokens Per Second (TPS)191.3 | 4 | |
| Mathematical Reasoning | GSM8K | Throughput (tok/s)420.7 | 4 | |
| Science Question Answering | GPQA | TPS (tok/s)151.7 | 4 | |
| Instruction Following | MT-Bench | MT-Bench Score9.11 | 3 |