Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context language modeling evaluation on RULER 32K

88.6Average Score (RULER 32K)

Full KV

2.5424.882547.22569.5675Oct 9, 2025Nov 5, 2025Dec 2, 2025Dec 29, 2025Jan 25, 2026Feb 21, 2026Mar 21, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2025.10
88.6---------
2025.10
88.5---------
2025.10
85.2---------
2026.03
75.499.698989746.694.1522.2869.7853.2
2025.10
75.4---------
2026.03
62.6410095.29638.431.0564.3521.665.851.4
2026.03
60.4310090.6924424.856.520.9264.2750.8
2026.03
49.8498.485.692.48.47.7517.121.2466.4751.2
2026.03
49.1610085.889.813.67.2516.520.8862.7745.8
2026.03
45.3110080.678.814.43.695.0520.4860.2744.6
2025.10
21.1---------
2026.03
5.850.8000004.0424.223.6