| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (test) | GPT-3 | Accuracy29.9 | 27 | 1mo ago | |
| TriviaQA (standard) | GPT-3 | Accuracy71.2 | 24 | 1mo ago | |
| Closed Book QA (TriviaQA, TydiQA, NaturalQuestions, WebQuestions) | MIDAS | Accuracy21.8 | 14 | 1mo ago | |
| Natural Questions | U-PaLM | Accuracy40.1 | 12 | 1mo ago | |
| TriviaQA | U-PaLM | Accuracy82 | 12 | 1mo ago | |
| Natural Questions (dev) | FiD + Distillation | Accuracy54.4 | 9 | 1mo ago | |
| NaturalQuestions (test) | BART-Large | EM23 | 9 | 1mo ago | |
| TriviaQA TQ (test) | BART-Large | EM24.9 | 9 | 1mo ago | |
| Ever Young | LLM Score81.67 | 8 | 5d ago | ||
| TriviaQA unfiltered (test) | Chinchilla | Accuracy73.2 | 8 | 1mo ago | |
| TriviaQA filtered (dev) | FiD + Distillation | Accuracy72.5 | 7 | 1mo ago | |
| Natural Questions | PaLM | EM21.2 | 6 | 1mo ago | |
| MultiSpanQA (test) | CoVe (factored) | F1 Score48 | 5 | 1mo ago | |
| HotpotQA | AGDLRPh | Recall39.6 | 4 | 1mo ago | |
| BoolQ | Self-consistency | Accuracy78.4 | 3 | 1mo ago | |
| SQuAD adaptation 2 (test) | BART-Large (LM-finetuned) | EM1.8 | 2 | 1mo ago | |
| WebQuestions WB (test) | BART-Large | Exact Match30 | 1 | 1mo ago |