Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Metamath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMetaMath 1k
Token Count212
14
Automated Theorem ProvingMetamath (val)
Performance56.5
6
Formal Theorem ProvingMetamath set.mm (val)
Performance Score29.22
3
Theorem ProvingMetamath (test)
Pass@865.6
2
Showing 4 of 4 rows