LIMO: Less is More for Reasoning

About

We challenge the prevailing assumption that complex reasoning in large language models (LLMs) necessitates massive training data. We demonstrate that sophisticated mathematical reasoning can emerge with only a few examples. Specifically, through simple supervised fine-tuning, our model, LIMO, achieves 63.3\% accuracy on AIME24 and 95.6\% on MATH500, surpassing previous fine-tuned models (6.5\% on AIME24, 59.2\% on MATH500) while using only 1\% of the training data required by prior approaches. Furthermore, LIMO exhibits strong out-of-distribution generalization, achieving a 45.8\% absolute improvement across diverse benchmarks, outperforming models trained on 100x more data. Synthesizing these findings, we propose the Less-Is-More Reasoning Hypothesis (LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning can emerge through minimal but strategically designed demonstrations of cognitive processes. This hypothesis suggests that the threshold for eliciting complex reasoning is not dictated by task complexity but rather by two key factors: (1) the completeness of the model's pre-trained knowledge base and (2) the effectiveness of post-training examples in serving as "cognitive templates" that guide reasoning.

Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, Pengfei Liu• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K (test)	Accuracy92.19	954
Mathematical Reasoning	MATH500 (test)	Accuracy82	895
Reasoning	BBH	Accuracy67.8	726
Mathematical Reasoning	AIME 2024	Accuracy56.3	370
Mathematical Reasoning	Minerva	Pass@1 Accuracy22.8	289
Multitask Language Understanding	MMLU	Accuracy73.8	263
Science Reasoning	GPQA	Accuracy66.7	243
Mathematical Reasoning	MATH 500	pass@178	239
Mathematical Reasoning	MATH 500	Pass@1 Rate67	236
Mathematical Reasoning	AIME 2024	Pass@1 Accuracy16.6	236

Showing 10 of 61 rows

Other info

Follow for update

@wizwand_team Discord