Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

About

Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate. We propose MathScale, a simple and scalable method to create high-quality mathematical reasoning data using frontier LLMs (e.g., {\tt GPT-3.5}). Inspired by the cognitive mechanism in human mathematical learning, it first extracts topics and knowledge points from seed math questions and then build a concept graph, which is subsequently used to generate new math questions. MathScale exhibits effective scalability along the size axis of the math dataset that we generate. As a result, we create a mathematical reasoning dataset (MathScaleQA) containing two million math question-answer pairs. To evaluate mathematical reasoning abilities of LLMs comprehensively, we construct {\sc MwpBench}, a benchmark of Math Word Problems, which is a collection of ten datasets (including GSM8K and MATH) covering K-12, college, and competition level math problems. We apply MathScaleQA to fine-tune open-source LLMs (e.g., LLaMA-2 and Mistral), resulting in significantly improved capabilities in mathematical reasoning. Evaluated on {\sc MwpBench}, MathScale-7B achieves state-of-the-art performance across all datasets, surpassing its best peers of equivalent size by 42.9\% in micro average accuracy and 43.7\% in macro average accuracy, respectively.

Zhengyang Tang, Xingxing Zhang, Benyou Wang, Furu Wei• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy74.8
983
Mathematical ReasoningGSM8K (test)
Accuracy74.8
751
Mathematical ReasoningMATH
Accuracy35.2
643
Mathematical ReasoningMATH (test)
Overall Accuracy35.2
433
Mathematical ReasoningGSM8K--
351
Mathematical ReasoningCollegeMATH
Accuracy21.8
161
Mathematical ReasoningOlympiad Bench
Pass@1 Accuracy36.4
115
Mathematical ReasoningMATH 500--
106
Mathematical ReasoningCollegeMath (test)
Accuracy21.8
61
Mathematical ReasoningGSM-Hard
GSM-Hard pass@1 Acc67.6
27
Showing 10 of 12 rows

Other info

Follow for update