Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on MMLU-Pro (pass@1 accuracy)

73.44pass@1 Accuracy

FP16

14.638429.904245.1760.4358Jun 3, 2025Jul 29, 2025Sep 24, 2025Nov 20, 2025Jan 15, 2026Mar 13, 2026May 9, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
73.44
2025.06
70.47
2026.05
70.23
2026.05
68.93
2026.05
68.33
2025.06
68.25
2026.05
66.86
2026.03
65.8
2026.05
65.71
2026.03
65.6
2026.03
64.8
2026.03
64.3
2026.05
63.41
2026.05
63.32
2026.05
63.24
2026.02
62.8
2026.05
62.63
2026.02
62.5
2026.03
62.5
2026.05
62.41
2026.03
62.3
2026.02
61.6
2026.02
61.5
2026.03
61.4
2026.03
61.2
2026.03
60.8
2026.02
60.3
2026.02
60.3
2026.03
59.9
2026.02
59.3
2026.05
58.84
2026.03
58.5
2026.02
58
2026.05
57.31
2026.02
56.9
2026.02
56.2
2026.02
55.6
2025.06
55.28
2025.10
54.7
2025.10
54.5
2026.02
54.2
2026.02
53.9
2026.05
53.32
2026.02
53.3
2025.10
53.3
2025.10
53
2025.10
52.9
2026.02
52.4
2026.03
52.37
2026.03
51.9
2026.05
51.7
2026.02
51.6
2026.03
51.56
2026.05
51.44
2026.03
51.43
2026.05
51.05
2026.03
50.59
2025.10
50.5
2026.03
50.44
2025.10
49.2
2026.05
47.94
2026.05
47.48
2026.05
47.1
2026.03
46.7
2025.12
46.5
2025.06
46.24
2026.05
45.84
2025.12
44.9
2025.12
44.5
2026.05
44.29
2025.12
44.1
2026.05
43.79
2026.03
43.2
2025.12
42.7
2025.12
42.7
2025.10
42.7
2025.10
42.5
2026.03
42.26
2026.03
42
2025.10
41.7
2026.03
41.46
2025.10
37.7
2026.03
37.54
2025.12
36.7
2025.10
34.1
2025.10
32.7
2026.03
23.41
2026.03
23.25
2026.05
22.34
2026.03
21.24
2026.03
21.03
2026.03
21.02
2025.10
16.9