| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | WebQuestions (WebQs) | Accuracy63.5 | 67 | |
| Open-domain Question Answering | WebQuestions (WebQ) (test) | Exact Match (EM)57.8 | 55 | |
| Open-domain Question Answering | WebQuestions (WQ) Open-QA (test) | Exact Match48.7 | 38 | |
| Passage retrieval | WebQuestions (WQ) (test) | Top-20 Accuracy85.4 | 37 | |
| Open-domain Question Answering | WebQuestions (test) | Accuracy59.15 | 36 | |
| Passage Ranking | WebQuestions (WQ) | R@1063.97 | 28 | |
| Open Question Answering | WEBQUESTIONS (test) | F1 Score42.2 | 27 | |
| Open-Domain Question Answering | WebQuestions | Hits@184.6 | 19 | |
| Open-Domain Question Answering | WebQuestions (WebQ) (dev) | Exact Match51.5 | 17 | |
| Hallucination detection | WebQuestions | AUROC87.67 | 15 | |
| Open-Domain Question Answering | WebQuestions (WQ) | Exact Match (EM)45.2 | 15 | |
| Knowledge Base Question Generation | WebQuestions (test) | METEOR32.05 | 12 | |
| Question Answering | WebQuestions (test) | F1 (Berant)41.8 | 11 | |
| Open-domain QA | WebQuestions | Accuracy78.2 | 8 | |
| Knowledge Base Question Answering | WebQuestions (WebQ) (test) | Average F153.6 | 7 | |
| Question Answering | WebQuestions first 1000 samples (dev) | EM30.6 | 6 | |
| Open-domain Question Answering | WebQuestions (WebQS) | WebQS Score8.17 | 5 | |
| Question Answering | WebQuestions (WQ) (test) | Exact Match49.7 | 5 | |
| Question Answering | WebQuestions | WebQ Accuracy25.92 | 4 | |
| Question Generation | WebQuestions (test) | Syntactic Score4.53 | 3 | |
| Closed-book Question Answering | WebQuestions WB (test) | Exact Match30 | 1 |