| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context retrieval | Needle-in-a-Haystack (test) | Accuracy100 | 56 | |
| Needle-in-a-Haystack | Needle-in-a-Haystack | Accuracy100 | 44 | |
| Needle-in-a-haystack | Needle-in-a-haystack 4x original context | Accuracy100 | 35 | |
| Needle-in-a-Haystack Retrieval | Needle-in-a-Haystack 32K context (test) | Accuracy76 | 30 | |
| Needle-in-a-Haystack Retrieval | Needle-in-a-Haystack 8K context (test) | Accuracy100 | 30 | |
| Key Information Retrieval | Needle-in-a-Haystack 32K context | Accuracy98.2 | 19 | |
| Retrieval | Needle-in-a-Haystack L=8k | Accuracy100 | 18 | |
| Long-context Retrieval | Needle-in-a-Haystack | Retrieval Accuracy100 | 10 | |
| Long-context Information Retrieval | Needle-In-a-Haystack Verbatim prompt (test) | Accuracy (Depth 0%)0.996 | 7 | |
| Needle-In-a-Haystack | Needle-In-a-Haystack Gemini prompt (test) | Success Rate @ 0% Insertion57.2 | 7 | |
| Information Retrieval | Needle In A Haystack | Recall@1K90 | 6 | |
| Exact Retrieval | Needle-in-a-Haystack (NIAH) 64K | Average Accuracy97.6 | 5 | |
| Exact Retrieval | Needle-in-a-Haystack (NIAH) 32K | Average Accuracy100 | 5 | |
| Exact Retrieval | Needle-in-a-Haystack (NIAH) 16K | Average Accuracy100 | 5 | |
| Long-context retrieval | Needle-in-a-Haystack 1.0 (test) | Score99.9 | 5 | |
| Needle-in-a-haystack | Needle-in-a-haystack 8x original context | Accuracy52.2 | 4 | |
| Needle-in-a-haystack | Needle-in-a-haystack 2x original context | Needle-in-a-haystack Accuracy (2x Context)74.92 | 4 | |
| Long-context retrieval | Needle-in-a-Haystack (NiH) | Accuracy (512 tokens)100 | 3 |