| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMQA | F1 Score76.4 | 154 | |
| Question Answering | 2WikiMQA | F174.9 | 44 | |
| Multi-hop Reasoning | 2WikiMQA IRCoT 500 samples (test) | ACC52.8 | 27 | |
| Question Answering | 2WikiMQA (test) | EM35.9 | 18 | |
| Retrieval | 2WikiMQA (test) | Recall@K69.7 | 8 | |
| Multi-hop Question Answering | 2WikiMQA (test) | Exact Match48.6 | 7 | |
| Question Answering | 2WikiMQA (sampled) | Accuracy0.63 | 4 |