Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on MMLU-Pro (Accuracy)

82.3Accuracy

AFlow

28.94842.79956.6570.501Apr 15, 2025Jun 13, 2025Aug 11, 2025Oct 9, 2025Dec 7, 2025Feb 4, 2026Apr 5, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.01
82.3
2026.01
82
2026.01
81.8
81.33
81.03
2026.01
80.1
2025.06
78.49
2025.06
77.97
2026.01
77.4
2025.06
77.06
2025.12
67.25
2026.02
66.7
2026.03
66
2026.03
65.4
2026.03
65.3
2026.03
63.9
2026.02
62.8
2026.02
61.5
2026.03
61.1
2026.02
60.8
2026.03
60.8
2026.03
60.4
2026.02
58.6
2026.03
58.2
2025.04
57.82
2026.01
57.8
2026.04
57.52
2026.04
57.1
2026.04
56.49
2026.01
56.2
2025.08
56.2
2025.04
56.16
2026.02
56
2025.04
55.82
2025.04
55.77
2026.03
55.7
2025.04
55.51
2026.02
55
2025.04
54.81
2026.01
54.6
2026.03
54.3
2025.04
54.26
2026.01
53.9
2026.03
53.7
2026.01
53.4
2025.04
52.06
2026.02
52
2026.03
51.5
2025.08
51.3
2026.01
51
2025.04
48.9
2026.01
48.8
2026.01
46.9
2025.12
45.94
2026.03
45.9
2025.08
45.8
2026.01
45.7
2026.01
45.4
2025.08
45.1
2025.04
45
2025.08
44.6
2026.01
44.5
2026.01
44.2
2025.08
44
2025.08
43.3
2026.01
43.2
2026.03
42.5
2026.01
42.3
2026.01
42.1
2026.01
41.1
2026.03
40.5
2025.12
40.36
2025.05
40.3
2025.12
40.05
2025.05
39.8
2026.02
39.8
2026.01
38.9
2025.05
38.6
2026.01
38.5
2025.08
38.4
2025.05
37.9
2025.05
37.7
2026.04
36.82
2026.01
36.8
2025.12
35.8
2026.01
35.6
2025.12
35.23
2025.12
35.02
2026.01
34.3
2026.01
33.9
2026.01
33.5
2026.02
33.2
2026.01
33.1
2026.01
32.5
2026.04
31.89
2026.03
31.7
2025.05
31.5
2026.02
31.4
2026.01
31.1
2025.05
31
Showing 100 of 114 rows