| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | LV-Eval (test) | EM14.5 | 19 | |
| Multi-hop Question Answering | LV-Eval (test) | F1 Score12.9 | 14 | |
| Long-context Question Answering | LV-Eval | F1 Score14.81 | 14 | |
| Question Answering | LV-Eval | Average Token Count51,066.2 | 7 | |
| Multi-hop Question Answering | LV-Eval | Average Running Time (s)1.31 | 6 | |
| Retrieval | LV-Eval | Average Running Time (s)0.41 | 5 | |
| Long-context retrieval and reasoning | LV-Eval | Performance (16k Context)58.82 | 5 | |
| Long-context language understanding | LV-Eval | CMRC (Mixup)7.05 | 4 | |
| Multi-Hop QA | LV-Eval | EM10.5 | 3 |