| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (test) | GPT-3 | Accuracy29.9 | 27 | 2d ago | |
| TriviaQA (standard) | GPT-3 | Accuracy71.2 | 24 | 2d ago | |
| Closed Book QA (TriviaQA, TydiQA, NaturalQuestions, WebQuestions) | MIDAS | Accuracy21.8 | 14 | 3d ago | |
| Natural Questions | U-PaLM | Accuracy40.1 | 12 | 3d ago | |
| TriviaQA | U-PaLM | Accuracy82 | 12 | 3d ago | |
| Natural Questions (dev) | FiD + Distillation | Accuracy54.4 | 9 | 3d ago | |
| NaturalQuestions (test) | BART-Large | EM23 | 9 | 3d ago | |
| TriviaQA TQ (test) | BART-Large | EM24.9 | 9 | 3d ago | |
| TriviaQA unfiltered (test) | Chinchilla | Accuracy73.2 | 8 | 3d ago | |
| TriviaQA filtered (dev) | FiD + Distillation | Accuracy72.5 | 7 | 3d ago | |
| Natural Questions | PaLM | EM21.2 | 6 | 3d ago | |
| MultiSpanQA (test) | CoVe (factored) | F1 Score48 | 5 | 3d ago | |
| BoolQ | Self-consistency | Accuracy78.4 | 3 | 2d ago | |
| HotpotQA | Self-consistency | EM33.8 | 3 | 2d ago | |
| SQuAD adaptation 2 (test) | BART-Large (LM-finetuned) | EM1.8 | 2 | 3d ago | |
| WebQuestions WB (test) | BART-Large | Exact Match30 | 1 | 3d ago |