Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Llemma: An Open Language Model For Mathematics

About

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K
Accuracy54
983
Mathematical ReasoningGSM8K (test)
Accuracy64.6
751
Mathematical ReasoningMATH
Accuracy18
535
Mathematical ReasoningMATH (test)
Overall Accuracy25
433
Mathematical ReasoningGSM8K
Accuracy (GSM8K)36.4
358
Language UnderstandingMMLU 5-shot--
132
Formal Theorem ProvingMiniF2F (test)
Pass@126.23
100
Mathematical Problem SolvingGaokao MathQA
Accuracy26.2
30
Mathematical ReasoningMathematical Reasoning Evaluation Harness GSM8K, MATH, SVAMP, ASDiv, MAWPS, TAB, MQA, SAT (test)
GSM8K Accuracy39.7
28
Arithmetic ComputationMATH
Pass@118.6
27
Showing 10 of 33 rows

Other info

Code

Follow for update