| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Boolean Question Answering | BoolQ | Accuracy91.26 | 307 | |
| Question Answering | BoolQ | Accuracy90.9 | 240 | |
| Reading Comprehension | BOOLQ | Accuracy94.47 | 219 | |
| Common Sense Reasoning | BoolQ | Accuracy92.4 | 131 | |
| Text Classification | BoolQ | Accuracy90.7 | 84 | |
| Question Answering | BoolQ (test) | Accuracy91.752 | 46 | |
| Boolean Question Answering | BoolQ (test) | Accuracy (Avg)86.7 | 38 | |
| Boolean Question Answering | BoolQ | Zero-shot Accuracy0.8229 | 36 | |
| Reading Comprehension | BoolQ (val) | Accuracy97.7 | 34 | |
| Yes/No Reading Comprehension | BoolQ 1.0 (test) | Normalized Accuracy69 | 33 | |
| Faithfulness evaluation | BoolQ | AUC π-Soft-NS37 | 27 | |
| Factual Knowledge | Bool Q | Accuracy82.39 | 26 | |
| Boolean Question Answering | BoolQ | Delta Accuracy-0.01 | 24 | |
| Binary Classification | BoolQ HELM | Balanced Accuracy89.75 | 18 | |
| Commonsense Reasoning | BoolQ | Accuracy87.29 | 18 | |
| Boolean Question Answering | BoolQ | Calibrated Accuracy86.1 | 18 | |
| Zero-shot Prediction | BoolQ | Accuracy77.68 | 17 | |
| Explanation Evaluation | BoolQ (test) | Sufficiency20.78 | 16 | |
| Reading Comprehension | BoolQ (test) | Accuracy99.87 | 16 | |
| Question Answering | BoolQ | Delta Accuracy2.16 | 15 | |
| Binary Question Answering | BoolQ | Accuracy (Neutral)85.22 | 15 | |
| Commonsense Reasoning | BoolQ | Accuracy (Inter-Layer Filtering)67 | 15 | |
| Boolean Question Answering | BoolQ-NP | Accuracy73.41 | 14 | |
| Yes/No Question Answering | BoolQ (test) | Accuracy79.2 | 12 | |
| Reading Comprehension | BoolQ SuperGLUE (val) | Accuracy78.57 | 9 |