Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Needle-in-a-Haystack

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-context retrievalNeedle-in-a-Haystack (test)
Accuracy100
56
Needle-in-a-HaystackNeedle-in-a-Haystack
Accuracy100
44
Needle-in-a-haystackNeedle-in-a-haystack 4x original context
Accuracy100
35
Needle-in-a-Haystack RetrievalNeedle-in-a-Haystack 32K context (test)
Accuracy76
30
Needle-in-a-Haystack RetrievalNeedle-in-a-Haystack 8K context (test)
Accuracy100
30
Key Information RetrievalNeedle-in-a-Haystack 32K context
Accuracy98.2
19
RetrievalNeedle-in-a-Haystack L=8k
Accuracy100
18
Long-context RetrievalNeedle-in-a-Haystack
Retrieval Accuracy100
10
Long-context Information RetrievalNeedle-In-a-Haystack Verbatim prompt (test)
Accuracy (Depth 0%)0.996
7
Needle-In-a-HaystackNeedle-In-a-Haystack Gemini prompt (test)
Success Rate @ 0% Insertion57.2
7
Information RetrievalNeedle In A Haystack
Recall@1K90
6
Exact RetrievalNeedle-in-a-Haystack (NIAH) 64K
Average Accuracy97.6
5
Exact RetrievalNeedle-in-a-Haystack (NIAH) 32K
Average Accuracy100
5
Exact RetrievalNeedle-in-a-Haystack (NIAH) 16K
Average Accuracy100
5
Long-context retrievalNeedle-in-a-Haystack 1.0 (test)
Score99.9
5
Needle-in-a-haystackNeedle-in-a-haystack 8x original context
Accuracy52.2
4
Needle-in-a-haystackNeedle-in-a-haystack 2x original context
Needle-in-a-haystack Accuracy (2x Context)74.92
4
Long-context retrievalNeedle-in-a-Haystack (NiH)
Accuracy (512 tokens)100
3
Showing 18 of 18 rows