Share your thoughts, 1 month free Claude Pro on usSee more

Long-context Retrieval on RULER 4k context (test)

95Accuracy

Llama-3.1-8B-Instruct

Updated 2mo ago

Evaluation Results

Method	Links
Llama-3.1-8B-Instruct 2026.05		95
Qwen3-8B 2026.05		94.8
gemma-3-12b-it 2026.05		94.7
Ministral-3-8B-Instruct-2512-BF16 2026.05		94.6
Qwen3-4B 2026.05		92.7
Llama-3.2-3B-Instruct 2026.05		92.5
Moonlight-16B-A3B-Instruct 2026.05		92.2
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA 2026.05		91.6
gemma-3-4b-it 2026.05		90.9
GPT-5 nano 2026.05		87.2
Velvet-14B 2026.05		85.5
gpt-oss-20b 2026.05		79.6
EngGPT2-16B-A3B 2026.05		79.5
deepseek-moe-16b-chat 2026.05		77.5
FastwebMIIA-7B 2026.05		74.9
Minerva-7B-instruct-v1,0 2026.05		55