Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
About
This paper introduces Light-R1, an open-source suite for training long reasoning models using reproducible and cost-effective methodology. Given the proprietary nature of data used in the DeepSeek-R1 series, we develop an alternative approach leveraging exclusively public data and models. Our curriculum training progressively increases data difficulty, combined with multi-staged post-training. Our Light-R1-32B model, trained from Qwen2.5-32B-Instruct, outperforms DeepSeek-R1-Distill-Qwen-32B in math reasoning. Experimental results show that this curriculum approach becomes more effective when distinct, diverse datasets are available for different training stages: fine-tuning DeepSeek-R1-Distilled models (pre-tuned by DeepSeek team on proprietary data) with 3,000 challenging examples from our curriculum dataset yielded state-of-the-art 7B and 14B models, while the 32B model, Light-R1-32B-DS performed comparably to QwQ-32B and DeepSeek-R1. Furthermore, we extend our work by applying GRPO on long reasoning models. Our final Light-R1-14B-DS achieves SOTA performance among 14B models in math, with AIME24 & 25 scores of 74.0 and 60.2 respectively, surpassing many 32B models and DeepSeek-R1-Distill-Llama-70B. Despite math-focused training, Light-R1-14B-DS demonstrates strong cross-domain generalization. Light-R1 represents a significant advancement in making sophisticated reasoning models more accessible and implementable in real-world applications. Our models, training data and code have been made available at https://github.com/Qihoo360/Light-R1.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | MATH 500 | pass@187.82 | 239 | |
| Mathematical Reasoning | AIME 2024 | Pass@1 Accuracy36.27 | 165 | |
| Mathematical Reasoning | AIME 2025 | Pass@1 Accuracy33.33 | 118 | |
| Mathematical Reasoning | Omni-MATH | Accuracy48.5 | 93 | |
| Mathematical Reasoning | OlympiadBench Math | Accuracy69.7 | 84 | |
| Mathematical Reasoning | HMMT 2025 | Accuracy25 | 70 | |
| Mathematical Reasoning | AIME 2025 | Accuracy31.3 | 59 | |
| Mathematical Reasoning | AMC 23 | Accuracy78.25 | 56 | |
| Mathematical Reasoning | AIME 24 | Rank4 | 40 | |
| Mathematical Reasoning | Brumo 25 | Rank4 | 40 |