Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
About
Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | AIME 2025 | Accuracy94.6 | 311 | |
| Mathematical Reasoning | AIME 2026 | AIME 2026 Accuracy93.3 | 55 | |
| Scientific Olympiad Reasoning | FrontierScience-Olympiad | Biology Accuracy25 | 30 | |
| General Reasoning | AVG Reasoning Suite | Accuracy77.3 | 18 | |
| Physics Olympiad Reasoning | IPhO 2024 | Points25.3 | 10 | |
| Physics Olympiad Reasoning | IPhO 2025 | Points21.7 | 10 | |
| Mathematical Reasoning | AMO-Bench | AMO-Bench Accuracy59.8 | 6 | |
| Reasoning | AnswerBench | Accuracy77.5 | 6 |