| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | OpenBookQA | Accuracy94.4 | 465 | |
| Question Answering | OpenBookQA (OBQA) (test) | OBQA Accuracy92.4 | 130 | |
| Question Answering | OpenBookQA | Accuracy84.4 | 126 | |
| Question Answering | OpenBookQA | Accuracy95.2 | 119 | |
| Question Answering | OpenBookQA | Normalized Accuracy55.6 | 102 | |
| Reasoning | OpenBookQA | Accuracy88.4 | 77 | |
| Commonsense Reasoning | OpenBookQA | Accuracy91 | 71 | |
| Multiple-choice Question Answering | OpenBookQA (test) | Accuracy90.8 | 39 | |
| Open-book Question Answering | OpenBookQA 1.0 (test) | Accuracy35 | 33 | |
| Zero-shot Reasoning | OpenbookQA | Accuracy44 | 26 | |
| Question Answering | OpenBook-QA | Accuracy91.6 | 24 | |
| Question Answering | OpenbookQA (OQA) (val) | Accuracy36.6 | 22 | |
| Question Answering | OpenBookQA (dev) | Accuracy90 | 22 | |
| Common Sense | OpenBookQA | Accuracy81.8 | 21 | |
| Question Answering | OpenBookQA | Composite Score92.14 | 20 | |
| Question Answering | OpenBookQA | Attack Success Rate (ASR)100 | 20 | |
| Multiple Choice Question Answering | OpenBookQA | Accuracy36.4 | 18 | |
| Question Answering | OpenBookQA | OpQA Score47 | 15 | |
| Question Answering | OpenBookQA | Mean Per-Step Regret0.157 | 15 | |
| Question Answering | OpenBookQA published (test) | Accuracy65.4 | 15 | |
| Commonsense Reasoning | OpenBookQA | Accuracy (Inter-layer)75.6 | 15 | |
| Question Answering | OpenBookQA Official Leaderboard | Accuracy95.2 | 14 | |
| Audio Question-Answering | OpenBookQA | Score91.4 | 12 | |
| Open Book Question Answering | OpenBookQA | Normalized Log Accuracy89.4 | 12 | |
| Question Answering | OpenBookQA D^v (train) | Accuracy100 | 12 |