Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMATH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMMATH
Accuracy78.4
36
Multilingual Mathematical ReasoningMMATH All Languages (test)
Average Score (All)28.6
22
Multilingual Mathematical ReasoningMMATH Out-of-Domain Languages (test)
Vietnamese Accuracy30.1
22
Multilingual Mathematical ReasoningMMATH In-Domain Languages (test)
Accuracy (Ar)26.3
22
Math Problem SolvingMMATH (test)
Accuracy (Ar)27.1
20
Mathematical ReasoningMMATH (test)
Ar Score37.4
10
Showing 6 of 6 rows