Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MT-AIME

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multilingual ReasoningMT-AIME 24
Accuracy (%)44.4
40
Showing 1 of 1 rows