LIMO: Less is More for Reasoning
About
We challenge the prevailing assumption that complex reasoning in large language models (LLMs) necessitates massive training data. We demonstrate that sophisticated mathematical reasoning can emerge with only a few examples. Specifically, through simple supervised fine-tuning, our model, LIMO, achieves 63.3\% accuracy on AIME24 and 95.6\% on MATH500, surpassing previous fine-tuned models (6.5\% on AIME24, 59.2\% on MATH500) while using only 1\% of the training data required by prior approaches. Furthermore, LIMO exhibits strong out-of-distribution generalization, achieving a 45.8\% absolute improvement across diverse benchmarks, outperforming models trained on 100x more data. Synthesizing these findings, we propose the Less-Is-More Reasoning Hypothesis (LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning can emerge through minimal but strategically designed demonstrations of cognitive processes. This hypothesis suggests that the threshold for eliciting complex reasoning is not dictated by task complexity but rather by two key factors: (1) the completeness of the model's pre-trained knowledge base and (2) the effectiveness of post-training examples in serving as "cognitive templates" that guide reasoning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AIME 2024 | Accuracy56.3 | 251 | |
| Mathematical Reasoning | AIME 2025 | Accuracy45.4 | 227 | |
| Science Reasoning | GPQA | Accuracy66.7 | 218 | |
| Mathematical Reasoning | AMC 23 | Accuracy88.4 | 198 | |
| Mathematical Reasoning | MATH 500 | MATH 500 Accuracy94.8 | 106 | |
| Mathematical Reasoning | MATH L5 | Accuracy0.853 | 86 | |
| Mathematical Reasoning | OlympiadBench Math | Accuracy34.9 | 84 | |
| Mathematical Reasoning | Omni-MATH | Accuracy21.8 | 68 | |
| Mathematical Reasoning | HMMT 2025 | Accuracy1.7 | 38 | |
| Mathematical Reasoning | AIME 2025 | Accuracy8.8 | 37 |