Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Massive Multitask Language Understanding on MMLU (Overall Accuracy)

84.2MMLU Accuracy

MARI

20.65637.15353.6570.147May 12, 2026May 14, 2026May 17, 2026May 19, 2026May 22, 2026May 24, 2026May 27, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.05
84.2
2026.05
83.7
2026.05
83.6
2026.05
83.5
2026.05
83.4
2026.05
83.2
2026.05
81.6
2026.05
81.2
2026.05
81
2026.05
80.9
2026.05
80.8
2026.05
80.6
2026.05
76.58
2026.05
76.58
2026.05
76.34
2026.05
76.25
2026.05
76.08
2026.05
76.08
2026.05
76
2026.05
73.1
2026.05
72.6
2026.05
72.5
2026.05
72.4
2026.05
72.3
2026.05
72.1
2026.05
66.6
2026.05
66.1
2026.05
66
2026.05
65.9
2026.05
65.8
2026.05
65.5
2026.05
26.7
2026.05
26.3
2026.05
26.1
2026.05
26
2026.05
25.9
2026.05
25.7
2026.05
23.6
2026.05
23.4
2026.05
23.3
2026.05
23.2
2026.05
23.2
2026.05
23.1