Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Scientific ReasoningCEval Sci
Score66.19
20
Scientific ReasoningCEval Hard
Overall Score56.58
19
General KnowledgeCEval
Score90.4
13
Multi-task Language UnderstandingCEval
Accuracy44.7
13
Actuator InversionAll Environments (Ceval-in)
AER0.57
8
Language UnderstandingCEval
Accuracy63.03
8
Multiple-choice Question AnsweringCEval
Accuracy79.86
7
Chinese KnowledgeCEval
Accuracy74.1
6
General Knowledge EvaluationCEVAL
Accuracy85.52
5
Medical Knowledge EvaluationCEVAL Med
Accuracy91.46
5
General Language UnderstandingCEval
Accuracy73
4
General DomainsCEval
Accuracy90.91
4
Knowledge UnderstandingCEval
Accuracy45
2
Showing 13 of 13 rows