Llemma: An Open Language Model For Mathematics

About

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy54	1398
Mathematical Reasoning	GSM8K (test)	Accuracy64.6	816
Mathematical Reasoning	MATH	Accuracy18	535
Mathematical Reasoning	MATH (test)	Overall Accuracy25	433
Mathematical Reasoning	GSM8K	Accuracy (GSM8K)36.4	358
Language Understanding	MMLU 5-shot	--	153
Formal Theorem Proving	MiniF2F (test)	Pass@126.23	128
Mathematical Problem Solving	Gaokao MathQA	Accuracy26.2	60
Mathematical Reasoning	Mathematical Reasoning Evaluation Harness GSM8K, MATH, SVAMP, ASDiv, MAWPS, TAB, MQA, SAT (test)	GSM8K Accuracy39.7	28
Arithmetic Computation	MATH	Pass@118.6	27

Showing 10 of 33 rows

Other info

Code

Follow for update

@wizwand_team Discord