Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

C-Eval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Chinese Language UnderstandingC-Eval
Accuracy92.5
68
General Knowledge AssessmentC-Eval
Accuracy92.5
46
Knowledge EvaluationC-Eval (test)
Natural Sciences Score93.02
36
Language UnderstandingC-Eval
C-Eval Score87.7
29
Multi-level multi-discipline evaluationC-Eval
Accuracy81.4
28
Comprehensive Chinese EvaluationC-Eval
Accuracy89
22
KnowledgeC-Eval
C-Eval Knowledge Accuracy0.6824
18
General Language UnderstandingC-Eval (val)
Accuracy78.68
18
KnowledgeC-EVAL
Score88.12
17
General Knowledge EvaluationC-Eval (val)
Accuracy34.32
15
Chinese Language Knowledge and ReasoningC-Eval
Overall Score78.5
14
Comprehensive ExaminationC-Eval (test)
Accuracy71.5
14
C-EvalC-Eval
Accuracy42.79
13
Chinese General Knowledge Question AnsweringC-Eval
Accuracy91.82
13
General Knowledge EvaluationC-Eval (test)
Accuracy71.8
13
Chinese Language EvaluationC-Eval (val)
C-Eval 0-shot Score83
12
General KnowledgeC-Eval 1.0 (val)
Accuracy78.68
12
Knowledge-intensive reasoningC-Eval
Score90.2
7
Chinese Language UnderstandingC-Eval (test)
Accuracy86
7
General KnowledgeC-Eval
Score47.62
6
Question AnsweringC-Eval
Accuracy82.7
6
Chinese Language UnderstandingC-Eval
Exact Match91.8
6
ExamC-Eval
Accuracy83.28
4
Language UnderstandingC-Eval
Exact Match92.5
4
ReasoningC-Eval (val)
Acc (Normalized)26.15
3
Showing 25 of 29 rows