Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Massive Multitask Language Understanding on MMLU

83.34Accuracy

LoPT-GRPO

37.42449.344561.26573.1855May 22, 2025Jul 19, 2025Sep 15, 2025Nov 12, 2025Jan 9, 2026Mar 8, 2026May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.05
83.34
2026.05
83.32
2026.05
83.24
81.7
77
2026.03
69.49
2026.03
67.32
2026.03
67
2025.08
65.24
2025.12
64.6
2025.12
63.2
2025.05
63
2026.03
62.87
2026.03
62.5
2026.03
62
2025.05
61.9
2025.08
60.9
2025.05
60.8
2026.03
60.13
2025.05
60.1
2026.03
60.08
2026.03
59.98
2026.01
59.6
2026.03
59.45
2025.05
59.4
2025.08
59.34
2026.03
59.02
2026.01
58.7
2026.01
58.7
2026.01
58.4
2026.01
57.7
2025.08
57.63
2026.03
57.51
2026.01
57.1
2025.05
57.1
2025.08
57.1
2026.03
57
2025.08
56.82
2026.01
56.6
2026.03
56.35
2026.04
56.3
2026.03
56
2026.01
55.9
2026.04
55.9
2026.03
55.8
2026.04
55.6
2026.01
55.53
2026.03
55.5
2025.05
55.4
2026.04
55.4
2026.01
55.34
2026.01
55.33
2026.04
55.3
2026.01
55.26
2026.01
55.11
2026.01
55.1
2026.04
55.1
2026.01
55
2026.03
55
2026.03
54.99
2026.01
54.98
2026.01
54.97
2026.03
54.97
2026.01
54.9
2025.05
54.9
2026.04
54.9
2026.01
54.79
2025.05
54.2
2025.05
54
2026.01
53.6
2026.03
53.57
2025.08
53.19
2025.08
53.11
2026.01
53
2025.07
52.88
2026.01
52
2026.01
50.2
2026.03
50
2025.07
49.67
2026.03
49
2025.07
47.68
2025.07
47.56
2025.07
47.47
2026.02
46.94
2025.07
46.85
2025.07
46.66
2026.03
46.27
2026.03
46.12
2026.03
46.07
2026.03
45.77
2026.03
45.35
2025.07
45.27
2026.02
44.87
2026.03
44.2
2026.02
44.06
2026.02
42.35
2026.03
41.8
2026.03
39.82
2026.03
39.23
2026.03
39.19
Showing 100 of 129 rows