Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MT-AIME

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multilingual ReasoningMT-AIME 24
Accuracy (%)44.4
40
Multilingual Math ReasoningMT-AIME
Mean@385.67
23
Showing 2 of 2 rows