Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Capability Self-Assessment on Math

88.6M-F1

OLMo2-7B

25.99242.24658.574.754May 29, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
88.626.6101.9
2026.05
88.622.73100.9
2026.05
84.219.0796.6
2026.05
83.218.2692.7
2026.05
80.917.8100.4
2026.05
80.417.7698.8
2026.05
80.117.17100.7
2026.05
78.915.0272.2
2026.05
77.916.0649.7
2026.05
77.214.13101.8
2026.05
76.115.2698
2026.05
7611.189.2
2026.05
75.310.94100.8
2026.05
73.610.87100
2026.05
72.79.3184.3
2026.05
70.49.488.5
2026.05
70.39.7576.7
2026.05
70.39.8585.8
2026.05
68.97.8673.1
2026.05
67.77.6792
2026.05
67.76.35102.3
2026.05
67.27.0598.2
2026.05
66.87.24100
2026.05
65.66.0898.7
2026.05
64.86.1495
2026.05
64.76.8770.2
2026.05
64.25.1596.4
2026.05
59.83.43106.3
2026.05
55.25.98100
2026.05
53.25.15100
2026.05
50.7-100
2026.05
50.6-100
2026.05
48.8-100
2026.05
47.65.2630.9
2026.05
45.62.9513.8
2026.05
42.60.66100
2026.05
40.65.5113.3
2026.05
40.43.4110.4
2026.05
29.63.622.1
2026.05
28.40.913.1