| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | OpenBookQA | Accuracy94.4 | 465 | |
| Question Answering | OpenBookQA (OBQA) (test) | OBQA Accuracy92.4 | 130 | |
| Question Answering | OpenBookQA | Accuracy84.4 | 84 | |
| Reasoning | OpenBookQA | Accuracy88.4 | 63 | |
| Commonsense Reasoning | OpenBookQA | Accuracy91 | 41 | |
| Question Answering | OpenBookQA | Normalized Accuracy45 | 35 | |
| Open-book Question Answering | OpenBookQA 1.0 (test) | Accuracy35 | 33 | |
| Question Answering | OpenBook-QA | Accuracy91.6 | 24 | |
| Question Answering | OpenbookQA (OQA) (val) | Accuracy36.6 | 22 | |
| Question Answering | OpenBookQA (dev) | Accuracy90 | 22 | |
| Question Answering | OpenBookQA | Composite Score92.14 | 20 | |
| Question Answering | OpenBookQA | Attack Success Rate (ASR)100 | 20 | |
| Multiple Choice Question Answering | OpenBookQA | Accuracy36.4 | 18 | |
| Question Answering | OpenBookQA | Mean Per-Step Regret0.157 | 15 | |
| Question Answering | OpenBookQA published (test) | Accuracy65.4 | 15 | |
| Question Answering | OpenBookQA | Accuracy84.83 | 15 | |
| Commonsense Reasoning | OpenBookQA | Accuracy (Inter-layer)75.6 | 15 | |
| Question Answering | OpenBookQA Official Leaderboard | Accuracy95.2 | 14 | |
| Question Answering | OpenBookQA D^v (train) | Accuracy100 | 12 | |
| Question Answering | OpenbookQA | Open Accuracy88 | 12 | |
| Question Answering | OpenBookQA (D_eval) | Accuracy75.4 | 12 | |
| Question Answering | OpenBookQA D (train) | Accuracy94.6 | 12 | |
| Question Answering | OpenBookQA D^x (train) | Accuracy93.5 | 12 | |
| Knowledge | OpenBookQA (test) | Accuracy92.31 | 11 | |
| Common-sense QA | OpenbookQA | Accuracy52.8 | 10 |