Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset

About

This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon majority voting baseline. Combining these ideas, we train a series of models that achieve state-of-the-art results on mathematical reasoning benchmarks. To facilitate further research, we release our code, models, and the complete OpenMathReasoning dataset under a commercially permissive license.

Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH 500
Accuracy (Acc)92.9
543
Mathematical ReasoningAIME 2024
Accuracy87.9
370
Mathematical ReasoningAIME 24
Accuracy64.06
318
Mathematical ReasoningMinerva
Pass@1 Accuracy22.3
289
Mathematical ReasoningMATH 500
pass@195.55
239
Mathematical ReasoningAIME 2025
Accuracy86.1
227
Mathematical ReasoningOlympiadBench
Accuracy74.09
213
Mathematical ReasoningAMC23
PASS@1 Accuracy82.7
207
Mathematical ReasoningAIME 25
Pass@1 Accuracy50.1
178
Mathematical ReasoningMinerva
Pass@133.46
138
Showing 10 of 45 rows

Other info

Follow for update