Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Scientific ReasoningCEval Sci
Score66.19
20
Scientific ReasoningCEval Hard
Overall Score56.58
19
Language UnderstandingCEval
Accuracy63.03
17
General KnowledgeCEval
Score90.4
13
Multi-task Language UnderstandingCEval
Accuracy44.7
13
Actuator InversionAll Environments (Ceval-in)
AER0.57
8
Multiple-choice Question AnsweringCEval
Accuracy79.86
7
Chinese KnowledgeCEval
Accuracy74.1
6
General Knowledge EvaluationCEVAL
Accuracy85.52
5
Medical Knowledge EvaluationCEVAL Med
Accuracy91.46
5
General Knowledge and ReasoningCEval
Accuracy90.91
4
General Language UnderstandingCEval
Accuracy73
4
General DomainsCEval
Accuracy90.91
4
Knowledge UnderstandingCEval
Accuracy45
2
Showing 14 of 14 rows