Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematics

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMATHEMATICS
Accuracy74.1
46
Mathematical ReasoningMathematics out-of-domain (test)
Accuracy75.9
30
Mathematical ReasoningMathematics
Accuracy85.9
24
Mathematical ReasoningMathematics
Pass@165.8
18
Category RetrievalMathematics Amazon (test)
R@5031.4
15
Link PredictionMathematics
PREC@171.22
14
RerankingMathematics
NDCG@547.1
14
ReasoningMathematics
Normalized Score100
9
MathematicsMathematics (overall)
Mean Borda Score5.1388
8
Mathematical OptimizationMathematics MinMaxMinDist
Score4.1658
3
Mathematical OptimizationMathematics Circle-Packing
Score2.636
3
Language ModelingMathematics (val)
Perplexity556.73
2
Mathematics EvaluationMathematics Task
Token Match Rate30
2
Showing 13 of 13 rows