Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context retrieval and reasoning on RULER 11 tasks average

99.34Context Length 4K Performance

Full

37.428853.501969.57585.6481May 9, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
99.3498.8398.5594.8989.8579.3293.46
2026.05
94.0186.6684.1281.8778.6568.2882.27
2026.05
83.675.5471.1266.9557.4747.9967.11
2026.05
81.3573.6670.2369.8357.8448.9366.97
2026.05
39.8118.4212.110.579.918.1816.5