Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CMMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chinese Multitask Language UnderstandingCMMLU
Accuracy81.8
50
Multitask Language UnderstandingCMMLU (test)
Accuracy78.3
38
Language UnderstandingCMMLU
Accuracy90.1
27
Multi-task Language UnderstandingCMMLU
Accuracy89.28
22
ExaminationCMMLU
Score61.3
20
Chinese Language Knowledge and ReasoningCMMLU
Score77.01
14
General Language UnderstandingCMMLU
Overall Accuracy77.3
14
Comprehensive ExaminationCMMLU (test)
Accuracy68.1
14
Chinese Language UnderstandingCMMLU (test)
CMMLU Score0.574
13
Chinese Language UnderstandingCMMLU
Score90.9
10
General KnowledgeCMMLU
Accuracy88.4
9
Comprehensive cognitive reasoningCMMLU
Score53.45
8
KnowledgeCMMLU
Knowledge Score84.72
6
Medical Knowledge EvaluationCMMLU Med
Accuracy86.89
5
Chinese General KnowledgeCMMLU
Accuracy90.9
4
Knowledge & ReasoningCMMLU
Accuracy63.4
4
General DomainsCMMLU
Accuracy0.865
4
General Language UnderstandingCMMLU 5-shot
Accuracy90.61
3
Language UnderstandingCMMLU Cantonese
Accuracy (Humanities)27.72
3
Downstream Performance PredictionCMMLU
MSE0.0033
3
Multilingual UnderstandingCMMLU
Score72
2
Multilingual KnowledgeCMMLU
CMMLU Score71.8
2
Chinese Massive Multitask Language UnderstandingCMMLU
CMMLU Score57.4
2
Showing 23 of 23 rows