| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | QASC | Score89.6 | 36 | |
| Multiple Choice Question Answering | QASC | Accuracy100 | 22 | |
| Multiple Choice Question Answering | QASC (test) | Accuracy78.5 | 21 | |
| Scientific Reasoning Question Answering | QASC | Accuracy74.61 | 15 | |
| Question Answering | QASC | Recall@174.17 | 15 | |
| Science Question Answering | QASC (test) | Accuracy73.5 | 14 | |
| Commonsense Reasoning | QASC (dev) | Accuracy84.02 | 14 | |
| Question Answering | QASC | Cohen's d0.803 | 12 | |
| Question Answering | QASC | Spearman's rho0.2727 | 12 | |
| Answer Plausibility Estimation | QASC | Cohen's d0.803 | 10 | |
| Question Answering | QASC | F114.73 | 10 | |
| Multiple Choice Question Answering | QASC (dev) | Accuracy67.61 | 10 | |
| Chunking Strategy Evaluation for RAG | QASC Evaluation Set (5-fold cross-validation) | Precision85 | 9 | |
| Question Answering | QASC | Leakage Error14 | 9 | |
| Logical Refinement of Natural Language Explanations | QASC | Initial Score17 | 8 | |
| Domain-specific Question Answering | qasc | Accuracy68.36 | 7 | |
| Commonsense Question Answering | QASC (dev) | Accuracy83.7 | 7 | |
| Commonsense Reasoning | QASC (test) | Accuracy90.06 | 6 | |
| Commonsense Question Answering | Scientific Commonsense (QASC) 1.0 (test) | Accuracy53.04 | 5 | |
| Question Answering | QASC MRQA few-shot | F1 Score99.1 | 5 | |
| Commonsense Question Answering | QASC | Accuracy72.8 | 4 | |
| Question Answering | QASC | Accuracy43 | 2 |