| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context retrieval | Needle-in-a-Haystack (test) | Accuracy100 | 56 | |
| Needle-in-a-Haystack | Needle-in-a-Haystack | Accuracy100 | 44 | |
| Needle Retrieval | Needle In A Haystack | Exact-match Accuracy67 | 40 | |
| Needle-in-a-haystack | Needle-in-a-haystack 4x original context | Accuracy100 | 35 | |
| Needle-in-a-Haystack Retrieval | Needle-in-a-Haystack 32K context (test) | Accuracy76 | 30 | |
| Needle-in-a-Haystack Retrieval | Needle-in-a-Haystack 8K context (test) | Accuracy100 | 30 | |
| Long-context Retrieval | Needle-in-a-Haystack | Retrieval Accuracy100 | 29 | |
| Retrieval | Needle-in-a-Haystack L=8k | Accuracy100 | 24 | |
| Key Information Retrieval | Needle-in-a-Haystack 32K context | Accuracy98.2 | 19 | |
| Multi-key needle-in-a-haystack recall | Multi-key needle-in-a-haystack 16k context length | Recall100 | 16 | |
| Synthetic Retrieval | Needle-In-A-Haystack (NIAH) | NIAH-1 Success Rate99.8 | 7 | |
| Long-context Information Retrieval | Needle-In-a-Haystack Verbatim prompt (test) | Accuracy (Depth 0%)0.996 | 7 | |
| Needle-In-a-Haystack | Needle-In-a-Haystack Gemini prompt (test) | Success Rate @ 0% Insertion57.2 | 7 | |
| Information Retrieval | Needle In A Haystack | Recall@1K90 | 6 | |
| Exact Retrieval | Needle-in-a-Haystack (NIAH) 64K | Average Accuracy97.6 | 5 | |
| Exact Retrieval | Needle-in-a-Haystack (NIAH) 32K | Average Accuracy100 | 5 | |
| Exact Retrieval | Needle-in-a-Haystack (NIAH) 16K | Average Accuracy100 | 5 | |
| Long-context retrieval | Needle-in-a-Haystack 1.0 (test) | Score99.9 | 5 | |
| Needle-in-a-haystack | Needle-in-a-haystack 8x original context | Accuracy52.2 | 4 | |
| Needle-in-a-haystack | Needle-in-a-haystack 2x original context | Needle-in-a-haystack Accuracy (2x Context)74.92 | 4 | |
| Long-context retrieval | Needle-in-a-Haystack (NiH) | Accuracy (512 tokens)100 | 3 | |
| Information Retrieval | Needle In A Haystack | Retrieval Success (d=1)3 | 2 |