| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multitask Language Understanding | GMMLU c | Acc (Normalized)30.75 | 22 | |
| Knowledge Evaluation | GMMLU c | Accuracy32 | 7 | |
| Multi-task Language Understanding | GMMLU Spanish c | Acc (Normalized)33.75 | 7 | |
| Question Mining | GMMLU common 30 languages (test) | XSim Score11.3 | 7 | |
| Question Mining | GMMLU all 41 languages (test) | XSim Score23.9 | 7 | |
| Knowledge Reasoning | GMMLU c | Normalized Accuracy32 | 3 | |
| Multitask Language Understanding | GMMLU Spanish (test) | Normalized Accuracy32.5 | 3 |