Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context language modeling evaluation on RULER

100Single-key Accuracy

Full Attention

61.83271.74181.6591.559Mar 9, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
10010010010082.584.489.859.289.5
2026.03
10010010099.581.280.389.859.288.8
2026.03
10010096.41007883.781.659.287.4
2026.03
95.995.997.597.576.376.979.659.284.8
2026.03
95.99892.993.985.776.275.559.284.7
2026.03
93.910094.994.973.189.877.65184.4
2026.03
91.81001009979.687.181.659.287.3
2026.03
63.310098.598.58293.979.65183.3