| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Knowledge-Intensive Language Tasks | KILT (test) | WoW F1 Score21.6 | 29 | |
| Page-level retrieval | KILT (test) | WoW Score62.9 | 28 | |
| Question Answering | KILT Benchmark Natural Questions, TriviaQA, HotpotQA (test) | EM Score (TriviaQA)85.6 | 10 | |
| Passage-level Retrieval | KILT (dev) | FEV Score67.8 | 8 | |
| Question Answering | KILT TriviaQA | Match Score91.9 | 5 | |
| Question Answering | KILT HotpotQA | Match51.9 | 5 | |
| Slot Filling | KILT T-REX (test) | Retrieval Score75.5 | 4 | |
| Slot Filling | KILT ZSRE (test) | Retrieval80.5 | 4 | |
| Long-form Question Answering | KILT ELI5 (test) | Retrieval Score36.3 | 4 | |
| Knowledge-grounded Dialogue | KILT WoW (test) | Retrieval49.8 | 4 | |
| Open Domain Question Answering | KILT NQ* (test) | Retrieval Rate65.1 | 4 | |
| Paragraph-level Retrieval | KILT benchmark | FEVER Score62.8 | 4 | |
| Long-form Question Answering | KILT ELI5 (dev test) | RL Score26.3 | 3 |