Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

C-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chinese Language UnderstandingC-Eval
Accuracy92.5
56
General Knowledge AssessmentC-Eval
Accuracy92.5
46
Multi-level multi-discipline evaluationC-Eval
Accuracy81.4
28
Language UnderstandingC-Eval
C-Eval Score87.7
24
General Language UnderstandingC-Eval (val)
Accuracy78.68
18
KnowledgeC-EVAL
Score88.12
17
Comprehensive Chinese EvaluationC-Eval
Accuracy89
16
Chinese Language Knowledge and ReasoningC-Eval
Overall Score78.5
14
Comprehensive ExaminationC-Eval (test)
Accuracy71.5
14
C-EvalC-Eval
Accuracy42.79
13
Chinese General Knowledge Question AnsweringC-Eval
Accuracy91.82
13
General Knowledge EvaluationC-Eval (test)
Accuracy71.8
13
Chinese Language EvaluationC-Eval (val)
C-Eval 0-shot Score83
12
General KnowledgeC-Eval 1.0 (val)
Accuracy78.68
12
KnowledgeC-Eval
C-Eval Knowledge Accuracy0.589
9
General Knowledge EvaluationC-Eval (val)
Accuracy34.32
8
Chinese Language UnderstandingC-Eval (test)
Accuracy86
7
Question AnsweringC-Eval
Accuracy82.7
6
Chinese Language UnderstandingC-Eval
Exact Match91.8
6
ExamC-Eval
Accuracy83.28
4
Language UnderstandingC-Eval
Exact Match92.5
4
General Language UnderstandingC-Eval 5-shot
Accuracy0.9249
3
Downstream Performance PredictionC-Eval
MSE0.0037
3
General Competence EvaluationC-EVAL
Accuracy83.98
2
Comprehensive Chinese Transformer EvaluationC-Eval
C-Eval Score55.5
1
Showing 25 of 25 rows