Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NuminaMath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningNuminaMath
Accuracy78.67
20
Mathematical ReasoningNuminaMath
Math Accuracy62.45
18
Mathematical ReasoningNuminaMath (val)
Accuracy21.7
8
Reward PredictionNuminaMath (out-of-domain)
Accuracy42.22
6
Mathematical ReasoningNuminaMath subset of 5,000 samples
Accuracy73.94
3
Mathematical ReasoningNuminaMath
Accuracy60.9
1
Showing 6 of 6 rows