Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MCEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Factual Knowledge RetrievalMCEval mLAMA 8K (test)
Accuracy79.5
14
Hallucination EvaluationMCEval HaluEval 8K (test)
Accuracy82.2
14
Commonsense Question AnsweringMCEval CSQA 8K (test)
Accuracy84.6
14
Paraphrase IdentificationMCEval PAWS 8K (test)
Accuracy88.9
14
Topic ClassificationMCEval Agnews 8K (test)
Accuracy88.3
14
Named Entity RecognitionMCEval NER 8K (test)
Accuracy0.877
14
Image Captioning EvaluationMCEval 1.0 (test)
Real Style Score87.8
12
Code InfillingMcEval Multi-line
JavaScript Pass@140
10
Code InfillingMcEval Single-line
JavaScript Pass@178.8
10
Showing 9 of 9 rows