| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | OBQA | Accuracy94.95 | 300 | |
| Commonsense Reasoning | OBQA | Accuracy89.2 | 117 | |
| Multiple Choice Question Answering | OBQA | Accuracy93.2 | 69 | |
| Out-of-Distribution Detection | OBQA to MMLU | AUROC87.09 | 41 | |
| Question Answering | OBQA | Zero-shot Accuracy35.2 | 36 | |
| Reasoning | OBQA | Accuracy31.6 | 26 | |
| Uncertainty Estimation | OBQA | AUROC88.03 | 24 | |
| Commonsense Reasoning | OBQA | First-Token Accuracy91.4 | 24 | |
| Zero-shot Prediction | OBQA | Accuracy31.4 | 17 | |
| Multiple Choice Question Answering | OBQA (dev) | Accuracy86.1 | 17 | |
| Speech-to-Text Question-Answering | OBQA | Accuracy83.08 | 16 | |
| Commonsense Question Answering | OBQA | Accuracy93.4 | 14 | |
| Question Answering | OBQA | Accuracy88.8 | 14 | |
| Question Answering | OBQA (test) | Accuracy60.2 | 13 | |
| Common-sense reasoning | OBQA In-Distribution | Accuracy88.43 | 12 | |
| Reasoning | OBQA (leave-one-out setup) | Average Accuracy87.7 | 12 | |
| Question Answering | OBQA (out-of-domain) | Acc95.59 | 12 | |
| Question Answering | OBQA | Accuracy Improvement2.01 | 12 | |
| OpenBook Question Answering | OBQA | Accuracy0.855 | 11 | |
| Question Answering | OBQA | Accuracy (Normalized)41.8 | 9 | |
| Question Answering | OBQA in-distribution (test) | Accuracy81.6 | 9 | |
| Reasoning | OBQA (val) | Accuracy39.6 | 9 | |
| Multiple-choice science question answering | OBQA In-Distribution 64 | Accuracy82.73 | 9 | |
| Audio-conditioned reasoning | OBQA | Accuracy77.74 | 8 | |
| Downstream Task | OBQA | Accuracy25.2 | 7 |