| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Common-sense Reasoning | COPA | Accuracy99.2 | 256 | |
| Question Answering | COPA | Accuracy96 | 59 | |
| Commonsense Reasoning | COPA (test) | Accuracy98.67 | 54 | |
| Causal Reasoning | COPA | Accuracy90 | 51 | |
| Sentence Completion | COPA | Accuracy92.88 | 48 | |
| Multiple Choice | COPA | Accuracy100 | 36 | |
| Causal Question Answering | COPA | EM99.3 | 32 | |
| Multi-class Classification | Copa | Accuracy92 | 22 | |
| Causal Reasoning | Copa100 | Accuracy83 | 12 | |
| Commonsense Causal Reasoning | COPA (dev) | Accuracy93 | 7 | |
| Commonsense reasoning | Balanced COPA | Accuracy70.7 | 6 | |
| Commonsense Reasoning | COPA 2011 | Accuracy79 | 6 | |
| Choice of Plausible Alternatives | COPA 11 languages | Score55.5 | 5 | |
| Finetuning domain recovery | COPA | Recovery Score (Grader 1)5 | 4 | |
| Inference correction review (discard) | COPA | MHA100 | 4 | |
| Natural Language Inference | COPA | Accuracy80 | 3 | |
| Commonsense Causal Reasoning | COPA 5-shot | Accuracy85 | 3 | |
| Commonsense Reasoning | COPA es | Accuracy54.4 | 3 | |
| Commonsense Reasoning | COPA en | Accuracy56 | 3 | |
| Commonsense Reasoning | COPA (dev) | Accuracy86 | 3 | |
| Choice of Plausible Alternatives | COPA (dev) | Accuracy65.8 | 3 | |
| Inference correction review (correction) | COPA | MHA100 | 2 | |
| Inference correction review (reason) | COPA | MHA100 | 2 | |
| Timing comparison | COPA | MHA45.5 | 2 | |
| Event/state classification | COPA | MHA90.9 | 2 |