| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Factual Knowledge Retrieval | MCEval mLAMA 8K (test) | Accuracy79.5 | 14 | |
| Hallucination Evaluation | MCEval HaluEval 8K (test) | Accuracy82.2 | 14 | |
| Commonsense Question Answering | MCEval CSQA 8K (test) | Accuracy84.6 | 14 | |
| Paraphrase Identification | MCEval PAWS 8K (test) | Accuracy88.9 | 14 | |
| Topic Classification | MCEval Agnews 8K (test) | Accuracy88.3 | 14 | |
| Named Entity Recognition | MCEval NER 8K (test) | Accuracy0.877 | 14 | |
| Image Captioning Evaluation | MCEval 1.0 (test) | Real Style Score87.8 | 12 | |
| Code Infilling | McEval Multi-line | JavaScript Pass@140 | 10 | |
| Code Infilling | McEval Single-line | JavaScript Pass@178.8 | 10 |