| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | Babilong 16k context length | QA1 Accuracy58 | 9 | |
| Long-context reasoning | BABILong | Err (2k Context)14.1 | 6 | |
| Question Answering | Babilong 128k context length | QA1 Score38 | 5 | |
| Question Answering | Babilong 64k context length | QA1 Score25 | 5 |