Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees

About

Speculative decoding (SD) has become a standard technique for accelerating LLM inference without sacrificing output quality. Recent advances in speculative decoding have shifted from sequential chain-based drafting to tree-structured generation, where the draft model constructs a tree of candidate tokens to explore multiple possible drafts in parallel. However, existing tree-based SD methods typically build a fixed-width, fixed-depth draft tree, which fails to adapt to the varying difficulty of tokens and contexts. As a result, the draft model cannot dynamically adjust the tree structure to early stop on difficult tokens and extend generation for simple ones. To address these challenges, we introduce TALON, a training-free, budget-driven adaptive tree expansion framework that can be plugged into existing tree-based methods. Unlike static methods, TALON constructs the draft tree iteratively until a fixed token budget is met, using a hybrid expansion strategy that adaptively allocates the node budget to each layer of the draft tree. This framework naturally shapes the draft tree into a "deep-and-narrow" form for deterministic contexts and a "shallow-and-wide" form for uncertain branches, effectively optimizing the trade-off between exploration width and generation depth under a given budget. Extensive experiments across 5 models and 6 datasets demonstrate that TALON consistently outperforms state-of-the-art EAGLE-3, achieving up to 5.16x end-to-end speedup over auto-regressive decoding.

Tianyu Liu, Qitan Lv, Yuhao Shen, Xiao Sun, Xiaoyan Sun• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Speed Up (x)4.43
246
Instruction FollowingAlpaca
Speedup (x)4.04
111
Question AnsweringQA
Speedup Factor3.27
47
SummarizationCNN/DM
Speedup3.58
32
Code GenerationHumanEval
MAT9.48
14
Multi-turn conversationMT-Bench
MAT Score7.29
14
Showing 6 of 6 rows

Other info

Follow for update