Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PolyMath

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multilingual Mathematical ReasoningPolyMath (test)
Accuracy (Ar)20.3
30
Translator-call mode selectionPolyMath High
Macro F173.74
12
Translator-call mode selectionPolyMath Medium
Macro F175.22
12
Translator-call mode selectionPolyMath Low
Macro F1 Score87.64
12
Mathematical ReasoningPolyMath
Accuracy20.9
12
Mathematical ReasoningPolyMath medium ALL 1.0 (test)
Accuracy25.24
12
Mathematical ReasoningPolyMath medium (EN) 1.0 (test)
Accuracy32
12
Mathematical ReasoningPolyMath English
Pass@144
9
Mathematical ReasoningPolyMath Full
Accuracy (ar)18.9
7
Multilingual ReasoningPolymath Low
Accuracy (en)96.5
3
Mathematical ReasoningPolymath Low
Accuracy (en)96.8
3
Showing 11 of 11 rows