Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CMMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chinese Multitask Language UnderstandingCMMLU
Accuracy81.8
67
Language UnderstandingCMMLU
Accuracy90.1
62
General KnowledgeCMMLU
Accuracy89.5
50
Multitask Language UnderstandingCMMLU (test)
Accuracy78.3
38
Multilingual UnderstandingCMMLU
Score81.5
32
Chinese General KnowledgeCMMLU
Accuracy90.9
25
KnowledgeCMMLU
Knowledge Score84.72
25
Multi-task Language UnderstandingCMMLU
Accuracy89.28
24
ExaminationCMMLU
Score61.3
20
Chinese Language Knowledge and ReasoningCMMLU
Score77.01
14
General Language UnderstandingCMMLU
Overall Accuracy77.3
14
Comprehensive ExaminationCMMLU (test)
Accuracy68.1
14
Chinese Language UnderstandingCMMLU (test)
CMMLU Score0.574
13
Chinese Language UnderstandingCMMLU
Score90.9
10
General ReasoningCMMLU (test)
Accuracy64.1
8
Comprehensive cognitive reasoningCMMLU
Score53.45
8
Knowledge EvaluationCmmlu_c
Accuracy36.88
7
Chinese multiple-choice evaluationCMMLU
CMMLU College Mathematics Accuracy61.9
6
Question AnsweringCMMLU
Accuracy88.1
6
Multilingual KnowledgeCMMLU
CMMLU Score71.8
6
Medical Knowledge EvaluationCMMLU Med
Accuracy86.89
5
Knowledge & ReasoningCMMLU
Accuracy63.4
4
General DomainsCMMLU
Accuracy0.865
4
KnowledgeCMMLU (test)
Knowledge CMMLU Test Accuracy32.11
3
Knowledge ReasoningCMMLU c
Accuracy (Normalized)36.08
3
Showing 25 of 30 rows