Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

One-shot Entropy Minimization

About

We trained 13,440 large language models and found that entropy minimization requires only a single unlabeled data and 10 steps optimization to achieve performance improvements comparable to or even greater than those obtained using thousands of data and carefully designed rewards in rule-based reinforcement learning. This striking result may prompt a rethinking of post-training paradigms for large language models. Our code is avaliable at https://github.com/zitian-gao/one-shot-em.

Zitian Gao, Lynx Chen, Haoming Luo, Joey Zhou, Bryan Dai• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAIME 2024
Accuracy @1618.1
36
Scientific ReasoningGPQA
avg@1632.8
28
Mathematical ReasoningAIME 2025
Avg@166.2
28
Mathematical ReasoningAMC 2023
Avg@16 Score48.9
28
Mathematical ReasoningMATH500--
4
Mathematical ReasoningMinerva Math--
4
Mathematical ReasoningOlympiad Bench--
4
Mathematical ReasoningAMC23--
4
Showing 8 of 8 rows

Other info

Follow for update