Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

About

Entropy minimization (EM) trains the model to concentrate even more probability mass on its most confident outputs. We show that this simple objective alone, without any labeled data, can substantially improve large language models' (LLMs) performance on challenging math, physics, and coding tasks. We explore three approaches: (1) EM-FT minimizes token-level entropy similarly to instruction finetuning, but on unlabeled outputs drawn from the model; (2) EM-RL: reinforcement learning with negative entropy as the only reward to maximize; (3) EM-INF: inference-time logit adjustment to reduce entropy without any training data or parameter updates. On Qwen-7B, EM-RL, without any labeled data, achieves comparable or better performance than strong RL baselines such as GRPO and RLOO that are trained on 60K labeled examples. Furthermore, EM-INF enables Qwen-32B to match or exceed the performance of proprietary models like GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro on the challenging SciCode benchmark, while being 3x more efficient than self-consistency and sequential refinement. Our findings reveal that many pretrained LLMs possess previously underappreciated reasoning capabilities that can be effectively elicited through entropy minimization alone, without any labeled data or even any parameter updates.

Shivam Agarwal, Zimin Zhang, Lifan Yuan, Jiawei Han, Hao Peng• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH 500
pass@180.2
153
Mathematical ReasoningMinerva
Pass@136.8
55
Mathematical ReasoningIn-Distribution Reasoning Performance Suite (AIME, AMC, MATH-500, Minerva, Olympiad)
AIME 2024 Score17.7
30
ReasoningOut-of-Domain Reasoning Suite
ARC-c Score76.5
29
Mathematical ReasoningAIME 2025
Avg@3211.9
27
Mathematical ReasoningCompetition-level Math Benchmarks AIME24, AIME25, AMC23, MATH500, Olympiad, Minerva
AIME 24 Score7.5
21
Mathematical ReasoningAMC
Avg@3254.9
21
Science ReasoningGPQA Diamond
Pass@133.8
21
Academic ReasoningMMLU-Pro
Pass@144.5
15
Reasoning GeneralizationOut-of-Distribution Avg
Avg Score (OOD)52.6
15
Showing 10 of 13 rows

Other info

Follow for update