| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (test) | GPT-3 | Accuracy29.9 | 27 | 3mo ago | |
| TriviaQA (standard) | GPT-3 | Accuracy71.2 | 24 | 3mo ago | |
| SimpleQA (train) | Accuracy18.8 | 21 | 14d ago | ||
| Closed Book QA (TriviaQA, TydiQA, NaturalQuestions, WebQuestions) | MIDAS | Accuracy21.8 | 14 | 3mo ago | |
| Natural Questions | U-PaLM | Accuracy40.1 | 12 | 3mo ago | |
| TriviaQA | U-PaLM | Accuracy82 | 12 | 3mo ago | |
| Natural Questions (dev) | FiD + Distillation | Accuracy54.4 | 9 | 3mo ago | |
| NaturalQuestions (test) | BART-Large | EM23 | 9 | 3mo ago | |
| TriviaQA TQ (test) | BART-Large | EM24.9 | 9 | 3mo ago | |
| Ever Young | LLM Score81.67 | 8 | 1mo ago | ||
| TriviaQA unfiltered (test) | Chinchilla | Accuracy73.2 | 8 | 3mo ago | |
| TriviaQA filtered (dev) | FiD + Distillation | Accuracy72.5 | 7 | 3mo ago | |
| Natural Questions | PaLM | EM21.2 | 6 | 3mo ago | |
| MultiSpanQA (test) | CoVe (factored) | F1 Score48 | 5 | 3mo ago | |
| ARC-c | EM85.9 | 4 | 1mo ago | ||
| HotpotQA | AGDLRPh | Recall39.6 | 4 | 2mo ago | |
| BoolQ | Self-consistency | Accuracy78.4 | 3 | 3mo ago | |
| SQuAD adaptation 2 (test) | BART-Large (LM-finetuned) | EM1.8 | 2 | 3mo ago | |
| WebQuestions WB (test) | BART-Large | Exact Match30 | 1 | 3mo ago |