Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NuminaMath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningNuminaMath
Accuracy60.9
39
Mathematical ReasoningNuminaMath
Accuracy78.67
20
Mathematical ReasoningNuminaMath
Math Accuracy62.45
18
Mathematical ReasoningNuminaMath (val)
Accuracy21.7
8
LLM InferenceNuminaMath
Throughput (RPS)12.8
6
Reward PredictionNuminaMath (out-of-domain)
Accuracy42.22
6
Formal Theorem ProvingNuminaMath-LEAN (total)
Accuracy51
4
Formal Theorem ProvingNuminaMath LEAN (unsolved)
Accuracy26
4
Formal Theorem ProvingNuminaMath-LEAN (solved-H)
Accuracy47
4
Formal Theorem ProvingNuminaMath-LEAN (solved-K)
Accuracy100
4
Formal Theorem ProvingNuminaMath LEAN (In-domain)
Average Token Cost1,707.19
4
Mathematical ReasoningNuminaMath subset of 5,000 samples
Accuracy73.94
3
Showing 12 of 12 rows