One-shot Entropy Minimization
About
We trained 13,440 large language models and found that entropy minimization requires only a single unlabeled data and 10 steps optimization to achieve performance improvements comparable to or even greater than those obtained using thousands of data and carefully designed rewards in rule-based reinforcement learning. This striking result may prompt a rethinking of post-training paradigms for large language models. Our code is avaliable at https://github.com/zitian-gao/one-shot-em.
Zitian Gao, Lynx Chen, Haoming Luo, Joey Zhou, Bryan Dai• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AIME 2024 | Accuracy @1618.1 | 36 | |
| Scientific Reasoning | GPQA | avg@1632.8 | 28 | |
| Mathematical Reasoning | AIME 2025 | Avg@166.2 | 28 | |
| Mathematical Reasoning | AMC 2023 | Avg@16 Score48.9 | 28 | |
| Mathematical Reasoning | MATH500 | -- | 4 | |
| Mathematical Reasoning | Minerva Math | -- | 4 | |
| Mathematical Reasoning | Olympiad Bench | -- | 4 | |
| Mathematical Reasoning | AMC23 | -- | 4 |
Showing 8 of 8 rows