Foundations of Top-$k$ Decoding For Language Models

About

Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).

Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K (test)	Accuracy85.14	954
Question Answering	TriviaQA (test)	Accuracy67.8	121
Question Answering	TriviaQA	ACC59.67	62
Code Generation	HumanEval	Accuracy20	12
Scientific problem solving	SciBench	Accuracy3	12
Mathematical Reasoning	MATH500	Accuracy20	12
Code Generation	HumanEval	Accuracy27.6	10
Scientific Reasoning	SciBench	Accuracy4.9	10
Mathematical Reasoning	GSM8K	Accuracy70.7	10
Mathematical Reasoning	MATH500	Accuracy23	10

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord