Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context language modeling evaluation on RULER

100Single-key Accuracy

Full Attention

41.7656.887287.12Feb 6, 2025Apr 13, 2025Jun 18, 2025Aug 23, 2025Oct 28, 2025Jan 2, 2026Mar 9, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.03
10010010010082.584.489.859.289.5--
2026.03
10010010099.581.280.389.859.288.8--
2026.03
10010096.41007883.781.659.287.4--
2025.02
10010099.7598.2599.495.33846291.0544.90
2025.02
1009387.592.7596.690415267.9320.325.4
2025.02
10010099999792.67635376.8933.915.6
2025.02
10010098.7599.2599.694.67645886.2852.45.2
2025.02
1009993979892.67565479.9817.112.2
2025.02
1001009999.599.493.33725989.2956.51.9
2025.02
1009999.2510010096827493.9190.60
2025.02
1006361.7571.7599.892506463.8687.932
2025.02
1009899.599.7510093.33676981.0988.613.7
2025.02
1008183.587.510093.33556471.0988.924.3
2025.02
1009899.510010094.33646683.8789.510.7
2025.02
1009296.2597.2510094.67686681.0489.313.7
2025.02
1009998.7510010095.33726990.6989.93.4
2025.02
999898.2597.598.692435478.3826.613.9
2025.02
99949594.2590.695.67605978.27300
2025.02
9989958589.495.33605769.1740.511.6
2026.03
95.995.997.597.576.376.979.659.284.8--
2026.03
95.99892.993.985.776.275.559.284.7--
2026.03
93.910094.994.973.189.877.65184.4--
2026.03
91.81001009979.687.181.659.287.3--
2025.02
845362.2560.581.495535557.5949.526.4
2025.02
743124.252681.894.67455341.5648.546.9
2025.02
702416.516.2582.692.67455339.5930.649.4
2025.02
6516811.2574.891.67345132.1532.258.9
2026.03
63.310098.598.58293.979.65183.3--
2025.02
442514.252071.692.33445534.8825.355.4