| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-hop tool-use evaluation | Multi-hop 450 tasks (test) | Accuracy89 | 18 | |
| Information Retrieval | Multi-hop | NDCG@1058.16 | 12 | |
| Multi-hop Retrieval | Multi-hop 4 datasets aggregate (test) | NDCG@1058.5 | 8 | |
| Multi-hop reasoning | Multi-hop 2-hop N=500 | Accuracy79 | 2 |