Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

About

Reinforcement learning (RL) with large language models shows promise in complex reasoning. However, its progress is hindered by the lack of large-scale training data that is sufficiently challenging, contamination-free and verifiable. To this end, we introduce DeepMath-103K, a large-scale mathematical dataset designed with high difficulty (primarily levels 5-9), rigorous decontamination against numerous benchmarks, and verifiable answers for rule-based RL reward. It further includes three distinct R1 solutions adaptable for diverse training paradigms such as supervised fine-tuning (SFT). Spanning a wide range of mathematical topics, DeepMath-103K fosters the development of generalizable and advancing reasoning. Notably, models trained on DeepMath-103K achieve state-of-the-art results on challenging mathematical benchmarks and demonstrate generalization beyond math such as biology, physics and chemistry, underscoring its broad efficacy. Data: https://huggingface.co/datasets/zwhe99/DeepMath-103K.

Zhiwei He, Tian Liang, Jiahao Xu, Qiuzhi Liu, Xingyu Chen, Yue Wang, Linfeng Song, Dian Yu, Zhenwen Liang, Wenxuan Wang, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningOlympiadBench Math
Accuracy60.2
84
Mathematical ReasoningOmni-MATH
Accuracy45.4
68
Mathematical ReasoningHMMT 2025
Accuracy11.7
38
Mathematical ReasoningAIME 2025
Accuracy31.7
37
Mathematical Problem SolvingIneqMath (IM)
Exact Match Accuracy76
12
Mathematical Problem SolvingPutnam-Axiom (PA)
Exact Match Acc39.1
12
Mathematical Problem SolvingMATH-Perturb MP-hard
Exact Match Accuracy53
12
Mathematical Problem SolvingMATH-Perturb MP-simple
Exact Match Accuracy72
12
Mathematical Problem SolvingTheoremQA TQ-Math
Exact Match Accuracy55.4
12
Lemma JudgingNaturalProofs (test)
Exact Match Accuracy60.8
12
Showing 10 of 15 rows

Other info

Follow for update