Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on MMLU-Pro (Accuracy)

82.3Accuracy

AFlow

43.50853.57963.6573.721Apr 15, 2025Jun 22, 2025Aug 29, 2025Nov 5, 2025Jan 12, 2026Mar 21, 2026May 28, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.01
82.3-
2026.01
82-
2026.01
81.8-
81.33-
2026.05
81.1-
81.03-
2026.05
81-
2026.05
80.2-
2026.01
80.1-
2025.06
78.49-
2025.06
77.97-
2026.01
77.4-
2025.06
77.06-
2026.05
74.3-
2026.05
74.27-
2026.05
73.6-
2026.05
72.08-
69.4-
67.4-
2025.12
67.25-
2026.02
66.7-
2026.05
66.41-
2026.03
66-
2026.05
65.67-
2026.03
65.4-
2026.03
65.3-
2026.05
65-
2026.03
63.9-
2026.05
63.6-
2026.05
62.95-
2026.02
62.8-
2026.05
62.6-
2026.05
62.5-
2026.05
62.41-
2026.04
62.4-
2026.05
61.56-
2026.02
61.5-
2026.03
61.1-
2026.02
60.8-
2026.03
60.8-
2026.03
60.4-
2026.05
60.2-
2026.05
58.8-
2026.02
58.6-
2026.03
58.2-
2026.05
58.1-
2025.04
57.82-
2026.01
57.8-
2026.04
57.52-
2026.04
57.1-
2026.04
57-
2026.04
56.49-
2026.05
56.4-
2026.05
56.37-
2026.01
56.2-
2025.08
56.2-
2025.04
56.16-
2026.04
56.1-
2026.02
56-
2025.04
55.82-
2025.04
55.77-
2026.05
55.76-
2026.03
55.7-
2026.04
55.6-
2025.04
55.51-
2026.02
55-
2025.04
54.81-
2026.01
54.6-
2026.03
54.3-
2025.04
54.26-
2026.01
53.9-
2026.03
53.7-
2026.01
53.4-
2025.07
53-
2026.05
52.6-
2026.05
52.55-
2025.07
52.5-
2025.07
52.1-
2025.04
52.06-
2026.02
52-
2026.03
51.5-
2025.08
51.3-
2026.01
51-
2026.05
50.39-
2026.05
49.49-
2025.07
49.4-
2025.07
49.3-
2025.04
48.9-
2026.05
48.89-
2026.01
48.8-
2026.05
47.07-
2026.01
46.9-
2026.05
46.45-
2025.12
45.94-
2026.03
45.9-
2025.08
45.8-
2026.01
45.7-
2026.01
45.4-
2025.08
45.1-
2025.04
45-
Showing 100 of 201 rows