| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Open-domain Question Answering | 5 QA Benchmarks Average | Average Exact Match40.9 | 14 | |
| Question Answering | QA Benchmarks Zero-shot (BoolQ, Lambada, Piqa, OPQA, Winogrande, ARC-E, ARC-C, Hellaswag) | BoolQ Accuracy74.86 | 6 |