Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

C-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chinese Language UnderstandingC-Eval
Accuracy92.5
47
General Knowledge AssessmentC-Eval
Accuracy92.5
37
Multi-level multi-discipline evaluationC-Eval
Accuracy81.4
28
Language UnderstandingC-Eval
C-Eval Score87.7
24
General Language UnderstandingC-Eval (val)
Accuracy78.68
18
Chinese Language Knowledge and ReasoningC-Eval
Overall Score78.5
14
Comprehensive ExaminationC-Eval (test)
Accuracy71.5
14
General Knowledge EvaluationC-Eval (test)
Accuracy71.8
13
KnowledgeC-EVAL
Score88.12
12
Chinese Language EvaluationC-Eval (val)
C-Eval 0-shot Score83
12
General KnowledgeC-Eval 1.0 (val)
Accuracy78.68
12
Chinese General Knowledge Question AnsweringC-Eval
Accuracy91.82
10
General Knowledge EvaluationC-Eval (val)
Accuracy34.32
8
Chinese Language UnderstandingC-Eval (test)
Accuracy86
7
Chinese Language UnderstandingC-Eval
Exact Match91.8
6
Language UnderstandingC-Eval
Exact Match92.5
4
KnowledgeC-Eval
C-Eval Knowledge Accuracy0.589
4
General Language UnderstandingC-Eval 5-shot
Accuracy0.9249
3
Downstream Performance PredictionC-Eval
MSE0.0037
3
Comprehensive Chinese EvaluationC-Eval
Accuracy69
2
Comprehensive Chinese Transformer EvaluationC-Eval
C-Eval Score55.5
1
Showing 21 of 21 rows