| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Boolean Question Answering | BoolQ | Accuracy91.26 | 350 | |
| Question Answering | BoolQ | Accuracy90.9 | 317 | |
| Reading Comprehension | BOOLQ | Accuracy94.47 | 279 | |
| Common Sense Reasoning | BoolQ | Accuracy92.4 | 240 | |
| Reading Comprehension | BoolQ | Accuracy (BoolQ)88.07 | 228 | |
| Question Answering | BoolQ | Accuracy90.03 | 201 | |
| Text Classification | BoolQ | Accuracy90.7 | 118 | |
| Question Answering | BoolQ (test) | Accuracy91.752 | 62 | |
| Boolean Question Answering | BoolQ | Accuracy85.9 | 57 | |
| Multiple-choice Question Answering | BoolQ | MC Accuracy0.887 | 46 | |
| Factual Knowledge | Bool Q | Accuracy87.7 | 44 | |
| Reading Comprehension | BoolQ (test) | Accuracy99.87 | 43 | |
| Commonsense Reasoning | BoolQ | Accuracy87.6 | 41 | |
| Boolean Question Answering | BoolQ (test) | Accuracy (Avg)86.7 | 41 | |
| Boolean Question Answering | BoolQ | Zero-shot Accuracy0.8229 | 36 | |
| Reading Comprehension | BoolQ (val) | Accuracy97.7 | 34 | |
| Feature Attribution | BoolQ | Comprehensiveness72 | 33 | |
| Yes/No Reading Comprehension | BoolQ 1.0 (test) | Normalized Accuracy69 | 33 | |
| Closed-domain QA | BoolQ | EM85.2 | 30 | |
| Boolean Question Answering | BoolQ | Accuracy92.3 | 29 | |
| Boolean Question Answering | BoolQ | Accuracy88.91 | 27 | |
| Faithfulness evaluation | BoolQ | AUC π-Soft-NS37 | 27 | |
| Boolean Question Answering | BoolQ | Delta Accuracy-0.01 | 24 | |
| Classification | BoolQ (test) | Accuracy67.4 | 22 | |
| Question Answering | BoolQ | Loss0.23 | 20 |