Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on OlmoBaseEval Math (GSM8k, GSM Symbolic, MATH)

68.5Math Aggregate Score

Kimi Linear

13.17227.53641.956.264Dec 15, 2025Jan 2, 2026Jan 20, 2026Feb 7, 2026Feb 25, 2026Mar 15, 2026Apr 3, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
68.584.966.754
2025.12
67.284.565.451.6
2026.04
67.284.265.452
2026.04
65.780.964.851.4
2025.12
64.781.156.256.7
2025.12
63.281.361.247
2025.12
6281.264.640.2
2025.12
61.980.661.243.8
2025.12
60.779.956.245.9
2025.12
59.579.359.140.1
2025.12
57.576.357.338.8
2026.04
55.174.349.241.8
2025.12
54.775.548.640
2026.04
54.676.855.431.7
2026.04
54.675.248.440.1
2025.12
54.374.353.335.2
2025.12
53.977.653.131
2026.04
53.286.368.64.6
2025.12
49.882.362.74.5
2025.12
49.369.14236.8
2025.12
48.868.545.132.9
2025.12
46.266.744.427.4
2025.12
41.767.138.819.1
2025.12
41.56135.527.9
2025.12
39.76338.617.4
2025.12
39.660.933.624.3
2025.12
36.956.435.119.2
2026.04
33.757.125.818.2
2026.04
32.149.625.321.3
2025.12
29.248.226.313.1
2025.12
20.733.314.514.2
2026.04
18.332.810.811.3
2025.12
16.93012.58.2
2025.12
15.326.910.38.7