Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-context evaluation on RULER 32K context length (test)

100Niah1 Score

fp16 Baseline

-4235077Jan 31, 2024Apr 21, 2024Jul 12, 2024Oct 1, 2024Dec 22, 2024Mar 13, 2025Jun 3, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2024.01
10099.898.69468.21155.9564.537.889.6430.431.631.656.4
2025.06
10095.8352.0887.554.17-52.3456.5126.87-36.3623.9634.3856.37
2025.06
10098.9683.3393.7578.12-65.6254.1720-43.428.1237.563.91
2024.01
99.898.895.292.861.66.447.554.4541.048.5229.33313153.65
2025.06
97.9293.7554.1783.3371.88-59.3843.4916.04-51.3930.2135.4257.91
2024.01
95.486.849.873.623.4016.6522.9522.525.142426.428.436.54
2025.06
93.7510091.6793.7581.25-66.6752.0821.04-48.6130.2136.4665.04
2024.01
7685.659.672.611.4034.746.4539.68.2630.5324.827.639.78
2025.06
252.08000-001.04-19.4414.5816.677.16
2025.06
251.04000-0.2603.96-20.4922.9226.049.25
2025.06
1.045.211.046.251.04-6.254.170.21-54.8622.9227.0811.82
2025.06
0003.120-0.7800-20.4915.6216.675.15