Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dynamic Depth Decoding: Faster Speculative Decoding for LLMs

About

The acceleration of Large Language Models (LLMs) with speculative decoding provides a significant runtime improvement without any loss of accuracy. Currently, EAGLE-2 is the state-of-the-art speculative decoding method, improving on EAGLE with a dynamic draft tree. We introduce Dynamic Depth Decoding (DDD), which optimises EAGLE-2's tree drafting method using a dynamic depth. This extends the average speedup that EAGLE-2 achieves over EAGLE by $44\%$, giving DDD an average speedup of $3.16$x.

Oscar Brown, Zhengjie Wang, Andrea Do, Nikhil Mathew, Cheng Yu• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Speed Up (x)3.58
246
Instruction FollowingAlpaca
Speedup (x)3.43
111
Question AnsweringQA
Speedup Factor2.96
47
Multi-turn conversationMT-Bench
Speedup4.15
25
Multi-turn Conversation EvaluationMT-Bench
Speedup3.4
25
Showing 5 of 5 rows

Other info

Follow for update