Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context retrieval on Needle-in-a-Haystack (test)

100Accuracy

FullKV

37.974454.077270.1886.2828Mar 12, 2026Mar 15, 2026Mar 18, 2026Mar 22, 2026Mar 25, 2026Mar 28, 2026Apr 1, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.03
100
2026.04
100
2026.04
100
2026.03
99.46
2026.04
99
2026.03
98.02
2026.03
96.52
2026.03
95.75
2026.03
94.23
2026.03
93.94
2026.03
91.73
2026.03
90.97
2026.03
90.41
2026.03
88.5
87
2026.03
85.11
2026.03
84.7
2026.03
84.3
2026.03
83.8
2026.03
79.67
2026.03
78.3
2026.03
77.89
2026.03
76.25
2026.03
75.75
2026.03
75.64
2026.03
75.39
2026.03
74.84
2026.03
74.8
2026.03
74.67
2026.03
74.55
2026.03
73.64
2026.03
72.84
2026.03
72.36
2026.03
70.45
2026.03
70.34
2026.03
68.34
2026.03
67.5
2026.03
66.81
2026.03
65.23
2026.03
63.7
2026.03
62.7
2026.03
62.2
2026.03
61.98
2026.03
61.7
2026.03
61.45
2026.03
61.32
2026.03
61.18
2026.03
59.73
2026.03
56.5
2026.03
56.11
2026.03
55.45
2026.03
52.25
2026.03
50.92
2026.03
47.61
2026.03
42.37
2026.03
40.36