Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context language modeling on InfiniteBench (test)

34.82En QA Score

Llama 3.1 8B Instruct

-0.93528.347417.6326.9126Nov 5, 2025Nov 16, 2025Nov 27, 2025Dec 9, 2025Dec 20, 2025Dec 31, 2025Jan 12, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2025.11
34.82------10099.3287-
2025.11
34.69------10096.11-
2025.11
32.44------10092.259-
2025.11
27.57------6.786.442.4-
2026.01
1.0110.990.990.991.011.011.021.011.011
2026.01
11111111111
2026.01
0.830.80.90.850.840.830.720.760.70.730.82
2026.01
0.820.80.90.850.850.820.730.770.70.730.82
2026.01
0.740.580.770.710.710.690.670.480.620.680.69
2026.01
0.730.580.760.710.710.670.670.470.60.680.68
2026.01
0.530.540.540.530.540.540.540.540.530.530.53
2026.01
0.530.540.530.530.530.530.530.540.520.530.53
2026.01
0.50.50.510.50.510.50.50.520.50.50.5
2026.01
0.440.440.440.430.440.450.440.450.440.440.44