Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

About

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\$p\$ (nucleus) sampling, and min-\$p\$ sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-\$p\$ sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-\$p\$ sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at https://github.com/ErfanBaghaei/Top-H-Decoding.

Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy83.24
954
Multi-turn Dialogue EvaluationMT-Bench
Overall Score6.819
532
Instruction FollowingMT-Bench
MT-Bench Score7.14
287
Question AnsweringGPQA
Accuracy32.81
258
Open-ended generationCreative Writing Evaluation Prompts
Average Judge Score7.88
108
Creative WritingCreative Writing Evaluation Set 1.0 (test)
Creativity8.6
54
ReasoningGPQA
Accuracy32.37
37
Creative StorytellingCreative Storytelling Prompts
Creativity8.85
27
ReasoningGPQA
Accuracy32.37
21
Code GenerationHumanEval
Accuracy27
10
Showing 10 of 15 rows

Other info

Follow for update