Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Retrieval on RULER 4k context (test)
Loading...
95
Accuracy
Llama-3.1-8B-Instruct
53.4
64.2
75
85.8
May 8, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Llama-3.1-8B-Instruct
max context=128k
2026.05
95
Qwen3-8B
max context=32k
2026.05
94.8
gemma-3-12b-it
max context=128k
2026.05
94.7
Ministral-3-8B-Instruct-2512-BF16
max context=256k
2026.05
94.6
Qwen3-4B
max context=32k
2026.05
92.7
Llama-3.2-3B-Instruct
max context=128k
2026.05
92.5
Moonlight-16B-A3B-Instruct
max context=128k
2026.05
92.2
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
max context=8k
2026.05
91.6
gemma-3-4b-it
max context=128k
2026.05
90.9
GPT-5 nano
max context=400k
2026.05
87.2
Velvet-14B
max context=128k
2026.05
85.5
gpt-oss-20b
max context=128k
2026.05
79.6
EngGPT2-16B-A3B
max context=32k
2026.05
79.5
deepseek-moe-16b-chat
max context=4k
2026.05
77.5
FastwebMIIA-7B
max context=16k
2026.05
74.9
Minerva-7B-instruct-v1,0
max context=4k
2026.05
55
Feedback
Search any
task
Search any
task