Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Foundations of Top-$k$ Decoding For Language Models

About

Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).

Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy85.14
954
Question AnsweringTriviaQA (test)
Accuracy67.8
121
Question AnsweringTriviaQA
ACC59.67
62
Code GenerationHumanEval
Accuracy20
12
Scientific problem solvingSciBench
Accuracy3
12
Mathematical ReasoningMATH500
Accuracy20
12
Code GenerationHumanEval
Accuracy27.6
10
Scientific ReasoningSciBench
Accuracy4.9
10
Mathematical ReasoningGSM8K
Accuracy70.7
10
Mathematical ReasoningMATH500
Accuracy23
10
Showing 10 of 10 rows

Other info

Follow for update