Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multitask Language Understanding on MMLU (Exact Split, o=3)

92.1Accuracy

W/O Decontamination

19.61238.43157.2576.069Jan 27, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
92.10.345
2026.01
90.50.235
2026.01
89.80.228
2026.01
89.20.441
2026.01
88.60.435
2026.01
88.60.31
2026.01
86.40.288
2026.01
83.70.167
2026.01
83.30.382
2026.01
80.70.137
2026.01
78.50.115
2026.01
75.10.078
2026.01
74.40.071
2026.01
74.10.165
2026.01
71.30.04
2026.01
70.70.157
2026.01
70.10.125
2026.01
690.114
2026.01
67.3-
2026.01
67-
2026.01
66.90.12
2026.01
64.80.098
2026.01
64.60.027
2026.01
640.033
2026.01
63.80.089
2026.01
57.6-
2026.01
54.9-
2026.01
53.60.014
2026.01
48.20.068
2026.01
47.80.072
2026.01
47.50.069
2026.01
47.10.065
2026.01
45.80.007
2026.01
45.1-
2026.01
41.70.011
2026.01
41.60.254
2026.01
40.6-
2026.01
380.071
2026.01
37.50.031
2026.01
37.10.302
2026.01
23.90.167
2026.01
22.40.227