Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Capability Evaluation on OlmoBaseEval LBPP BBH MMLU Pro MC Deepmind Math (HeldOut)

42.6LBPP Score

Seed 36B

1.5212.18522.8533.515Dec 15, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
42.68562.231.3
2025.12
40.381.161.140.7
2025.12
31.77750.231.4
2025.12
30.381.458.935.3
2025.12
25.776.550.347.7
2025.12
22.154.748.132.8
2025.12
21.877.649.729.6
2025.12
21.575.144.325.4
2025.12
19.774.847.627.6
2025.12
18.561.533.932.2
2025.12
17.777.453.130.4
2025.12
17.370.148.126.7
2025.12
17.163.537.323.7
2025.12
13.473.245.332.5
2025.12
12.468.844.723
2025.12
11.880.850.440.2
2025.12
9.16337.424.1
2025.12
8.264.646.922
2025.12
8.158.839.620.1
2025.12
7.148.133.917.1
2025.12
5.855.638.820.2
2025.12
4.738.420.834.1
2025.12
4.336.621.328.3
2025.12
3.149.633.116.2