Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Retrieval on RULER 32k context (test)
Loading...
91.7
Accuracy
Ministral-3-8B-Instruct-2512-BF16
-3.668
21.091
45.85
70.609
May 8, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Ministral-3-8B-Instruct-2512-BF16
max context=256k
2026.05
91.7
Qwen3-8B
max context=32k
2026.05
91
Qwen3-4B
max context=32k
2026.05
89
GPT-5 nano
max context=400k
2026.05
88.9
Llama-3.1-8B-Instruct
max context=128k
2026.05
87.3
gemma-3-12b-it
max context=128k
2026.05
80.4
Llama-3.2-3B-Instruct
max context=128k
2026.05
77.8
gpt-oss-20b
max context=128k
2026.05
77.1
gemma-3-4b-it
max context=128k
2026.05
62.3
EngGPT2-16B-A3B
max context=32k
2026.05
42.6
FastwebMIIA-7B
max context=16k
2026.05
33.9
Moonlight-16B-A3B-Instruct
max context=128k
2026.05
32.7
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
max context=8k
2026.05
28.5
deepseek-moe-16b-chat
max context=4k
2026.05
16.9
Minerva-7B-instruct-v1,0
max context=4k
2026.05
10.1
Velvet-14B
max context=128k
2026.05
0
Feedback
Search any
task
Search any
task