| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-hop Question Answering | 2WikiMHQA | F1 Score85.56 | 55 | |
| Multi-hop Reasoning | 2WikiMHQA | AUROC0.7002 | 26 | |
| Multi-hop Question Answering | 2WikiMHQA (test) | EM71.85 | 17 | |
| Multi-hop Question Answering | 2WikiMHQA in-distribution | Exact Match (EM)79.39 | 17 | |
| Multi-hop Question Answering | 2WikiMHQA in-distribution v4 (test) | EM- | 0 |