| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | SciQ | Accuracy96 | 226 | |
| Multiple Choice Question Answering | SciQ | Accuracy100 | 74 | |
| Science Question Answering | SciQ | Normalized Accuracy97 | 44 | |
| Question Answering | SciQ (train) | Accuracy100 | 36 | |
| Uncertainty quantification | SciQ (test) | AUROC74.5 | 28 | |
| Factual Question Answering | SciQ (ID) | Precision76.44 | 24 | |
| Multiple Choice Question Answering | SciQ MC | Mean Per-Step Regret0.137 | 15 | |
| Question Answering | SciQ Abstract | Mean per-step regret0.135 | 15 | |
| Distractor Generation | Sciq (test) | Precision@124.3 | 15 | |
| Question Answering | SciQ (test) | Accuracy76.6 | 13 | |
| Language Modeling | SciQ | Perplexity11.95 | 13 | |
| Question Answering | SciQ (D_eval) | Accuracy71.4 | 12 | |
| Reading Comprehension | SciQ | Accuracy93.7 | 11 | |
| Science Question Answering | SciQ standard (test) | Accuracy90.2 | 8 | |
| Downstream Task | SciQ | Accuracy89.3 | 7 | |
| Question answering | SciQ-ar | Accuracy55.68 | 6 | |
| Question Answering | SciQ | ANLL52.8 | 4 | |
| Hallucination detection | SciQ (test) | ANLL59.9 | 4 | |
| Disciplinary Knowledge | SciQ | Accuracy81.1 | 4 | |
| Question Answering | SciQ Abstract | Accuracy80.6 | 3 | |
| Reading Comprehension | SCIQ | Exact Match58.92 | 3 | |
| Question Answering | SciQ | Exact Match75.98 | 3 | |
| Multiple Choice Question Answering | SciQ MC | Accuracy86.7 | 2 | |
| Question Answering | SciQ | MAE0.0045 | 2 | |
| Question Answering | SciQ MC | Accuracy86.7 | 1 |