DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

About

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo• 2024

Related benchmarks

Task	Dataset	Result
Object Hallucination Evaluation	POPE	Accuracy84.5	2019
Mathematical Reasoning	GSM8K	Accuracy92.8	1398
Code Generation	HumanEval	Pass@194.71	1043
Robot Manipulation	LIBERO	Object Achievement15.8	957
Mathematical Reasoning	GSM8K (test)	Accuracy86.7	954
Question Answering	ARC Challenge	Accuracy52	906
Mathematical Reasoning	MATH500 (test)	Accuracy92.6	895
Mathematical Reasoning	MATH	Accuracy87.8	882
Instruction Following	IFEval	IFEval Accuracy89.46	836
Mathematical Reasoning	GSM8K (test)	Accuracy88.2	816

Showing 10 of 2435 rows

...

Other info

Code

Follow for update

@wizwand_team Discord