| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Information Retrieval | NIAH (test) | Average Score99.6 | 59 | |
| Multi-needle retrieval | NIAH (M) | Accuracy (NIAH M)90.2 | 35 | |
| Long-context Retrieval | NIAH Single 3 | Accuracy (1024)100 | 22 | |
| Long-context Retrieval | NIAH 128k | Single Score24.4 | 20 | |
| Long-context Retrieval | NIAH 64k | Single Score49.3 | 20 | |
| Long-context retrieval | NIAH multivalue | Speedup4.1 | 20 | |
| Needle-In-A-Haystack Retrieval | NIAH | NIAH Score100 | 14 | |
| Long Context Retrieval | NIAH-Multi | Accuracy100 | 13 | |
| Synthetic Retrieval | NIAH | Score @ 4096 Context100 | 9 | |
| Long-context Retrieval | NIAH Single 2 | Accuracy (1024 tokens)100 | 8 | |
| Long-context Retrieval | NIAH Single-1 | Accuracy (1024)100 | 8 | |
| Needle-in-a-haystack retrieval | NIAH 64K 60 items, 3 needle positions (test) | F1 Score28.2 | 8 | |
| Information Retrieval | NIAH single v1.0 (test) | Accuracy100 | 8 | |
| Long-tail retrieval | NIAH 7-client multi-needle retrieval (non-IID partition) | MK-NIAH40 | 7 | |
| Long-context retrieval | NIAH (avg) | Score (4k Context)100 | 7 | |
| Runtime efficiency | NIAH (500 samples) | Latency (8K Context)34.01 | 6 | |
| Needle In A Haystack retrieval | NIAH L=2048 | Accuracy100 | 6 | |
| Long Context | NIAH | Accuracy99.8 | 6 | |
| Long-context retrieval | NIAH 32k | NIAH Score99 | 6 | |
| Long-context retrieval | NIAH 16k | NIAH Score98.6 | 6 | |
| Needle-in-a-haystack | NIAH Needle-in-a-haystack | NIAH Success Rate (32K Context)100 | 6 | |
| Long-context Retrieval | NIAH | Accuracy100 | 5 | |
| Needle-in-a-haystack | NIAH 1 | Success Rate (1k Context)79.69 | 5 | |
| Needle-in-a-haystack | NIAH-2 (test) | NIAH-2 Success Rate (1k)79.61 | 5 | |
| Long-range retrieval | NIAH 32K | Accuracy90.9 | 4 |