| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | Standard QA Benchmarks (2WikiMultiHopQA, HotpotQA, Bamboogle, MuSiQue, Natural Questions, TriviaQA, PopQA) (test) | 2WikiMultiHopQA Pass@180.4 | 11 | |
| Open-domain Question Answering | Standard QA Benchmarks Average | Avg@461 | 9 |