Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CMMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chinese Multitask Language UnderstandingCMMLU
Accuracy81.8
50
Language UnderstandingCMMLU
Accuracy90.1
42
Multitask Language UnderstandingCMMLU (test)
Accuracy78.3
38
General KnowledgeCMMLU
Accuracy89.5
25
Multi-task Language UnderstandingCMMLU
Accuracy89.28
22
ExaminationCMMLU
Score61.3
20
KnowledgeCMMLU
Knowledge Score84.72
16
Chinese Language Knowledge and ReasoningCMMLU
Score77.01
14
General Language UnderstandingCMMLU
Overall Accuracy77.3
14
Comprehensive ExaminationCMMLU (test)
Accuracy68.1
14
Chinese Language UnderstandingCMMLU (test)
CMMLU Score0.574
13
Chinese Language UnderstandingCMMLU
Score90.9
10
General ReasoningCMMLU (test)
Accuracy64.1
8
Comprehensive cognitive reasoningCMMLU
Score53.45
8
Chinese multiple-choice evaluationCMMLU
CMMLU College Mathematics Accuracy61.9
6
Question AnsweringCMMLU
Accuracy88.1
6
Medical Knowledge EvaluationCMMLU Med
Accuracy86.89
5
Chinese General KnowledgeCMMLU
Accuracy90.9
4
Knowledge & ReasoningCMMLU
Accuracy63.4
4
General DomainsCMMLU
Accuracy0.865
4
General Language UnderstandingCMMLU 5-shot
Accuracy90.61
3
Language UnderstandingCMMLU Cantonese
Accuracy (Humanities)27.72
3
Downstream Performance PredictionCMMLU
MSE0.0033
3
General Competence EvaluationCMMLU
Accuracy84.94
2
Multilingual UnderstandingCMMLU
Score72
2
Showing 25 of 27 rows