Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Solving Quantitative Reasoning Problems with Language Models

About

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra• 2022

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy58.8
983
Mathematical ReasoningGSM8K (test)
Accuracy58.8
751
Mathematical ReasoningMATH
Accuracy33.6
535
ReasoningBBH
Accuracy37.2
507
Mathematical ReasoningMATH (test)
Overall Accuracy50.3
433
Mathematical ReasoningSVAMP
Accuracy89.1
368
Mathematical ReasoningGSM8K
Accuracy (GSM8K)58.8
358
Math ReasoningGSM8K (test)
Accuracy78.5
155
Mathematical ReasoningMATH (test)
Pass@133.6
151
Mathematical ReasoningAQUA
Accuracy76.4
132
Showing 10 of 28 rows

Other info

Code

Follow for update