| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Abstention | Hotpot (test) | AUARC60.9 | 25 | |
| Question Answering | ADVHOTPOT | Accuracy82.4 | 12 | |
| Selective Question Answering | HOTPOT | Area under Coverage-F192.5 | 12 | |
| Retrieval Question Answering | HotPot | MRR47.7 | 6 | |
| Information Retrieval | Hotpot BEIR | nDCG0.687 | 5 | |
| Multi-hop Question Answering | Hotpot Kimi | EM54.07 | 4 | |
| Retrieval Question Answering | HotPot (in-domain) | MRR63.8 | 4 | |
| Error Detection and Recovery | Hotpot Robot Data (test) | Recovery Success Ratio5 | 3 | |
| Human-Robot Interaction | Hotpot | Successes10 | 2 |