| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | OBQA | Accuracy94.95 | 276 | |
| Commonsense Reasoning | OBQA | Accuracy89.2 | 75 | |
| Multiple Choice Question Answering | OBQA | Accuracy87.74 | 61 | |
| Question Answering | OBQA | Zero-shot Accuracy35.2 | 36 | |
| Zero-shot Prediction | OBQA | Accuracy31.4 | 17 | |
| Multiple Choice Question Answering | OBQA (dev) | Accuracy86.1 | 17 | |
| Question Answering | OBQA (test) | Accuracy60.2 | 13 | |
| Question Answering | OBQA (out-of-domain) | Acc95.59 | 12 | |
| Question Answering | OBQA | Accuracy Improvement2.01 | 12 | |
| OpenBook Question Answering | OBQA | Accuracy0.855 | 11 | |
| Speech-to-Text Question-Answering | OBQA | Accuracy65.9 | 9 | |
| Question Answering | OBQA in-distribution (test) | Accuracy81.6 | 9 | |
| Reasoning | OBQA (val) | Accuracy39.6 | 9 | |
| Multiple-choice science question answering | OBQA In-Distribution 64 | Accuracy82.73 | 9 | |
| Audio-conditioned reasoning | OBQA | Accuracy77.74 | 8 | |
| Downstream Task | OBQA | Accuracy25.2 | 7 | |
| Question Answering | OBQA | Accuracy90.1 | 6 | |
| Reasoning | OBQA | Accuracy30 | 6 | |
| Teacher Attribution | OBQA | Accuracy51 | 6 | |
| Question Answering | OBQA | Accuracy (GPT-2-Small)17.8 | 4 | |
| Open Book Question Answering | OBQA | Normalized PLL Score12.8 | 4 | |
| Commonsense Reasoning | OBQA (dev) | Accuracy66.7 | 3 |