Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Metamath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMetaMath Insufficient
Success Rate (SR)72.5
19
Mathematical ReasoningMetaMath 1k
Token Count212
14
Automated Theorem ProvingMetamath (val)
Performance56.5
6
Formal Theorem ProvingMetamath set.mm (val)
Performance Score29.22
3
Theorem ProvingMetamath (test)
Pass@865.6
2
Showing 5 of 5 rows