Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GMMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multitask Language UnderstandingGMMLU c
Acc (Normalized)30.75
22
Knowledge EvaluationGMMLU c
Accuracy32
7
Multi-task Language UnderstandingGMMLU Spanish c
Acc (Normalized)33.75
7
Question MiningGMMLU common 30 languages (test)
XSim Score11.3
7
Question MiningGMMLU all 41 languages (test)
XSim Score23.9
7
Knowledge ReasoningGMMLU c
Normalized Accuracy32
3
Multitask Language UnderstandingGMMLU Spanish (test)
Normalized Accuracy32.5
3
Showing 7 of 7 rows