Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Language Modeling on RULER (test) (4k-256k Sweep)

96.6Accuracy (4k Context)

Baseline

87.44889.82492.294.576Apr 7, 2026
Updated 10d ago

Evaluation Results

MethodLinks
2026.04
96.694.192.188.774.374.841.7
2026.04
96.195.692.789.378.77743.9
2026.04
95.193.69187.877.866-
2026.04
93.691.287.275.44913.8-
2026.04
93.393.291.186.878.646.1-
2026.04
92.890.385.779.976.369.5-
2026.04
87.883.478.669.95642-