Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multitask Language Understanding on CMMLU (test)

78.3Accuracy

GPT 4o

23.49237.72151.9566.179Mar 26, 2024Jun 4, 2024Aug 13, 2024Oct 22, 2024Dec 31, 2024Mar 11, 2025May 20, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2024.10
78.3
2024.10
72.2
2024.03
70.1
2024.03
68.7
2024.10
68
2024.03
66.5
2024.03
66.3
2024.03
63
2024.03
63
2024.03
62.5
2024.03
61.3
2024.03
57
2024.10
55.1
2024.03
53.3
2025.05
52.1
2024.10
51.7
2024.10
46.6
2025.05
46.34
2024.03
44.6
2025.05
41.99
2025.05
40.91
2025.05
39.36
2024.03
38.8
2025.05
38.55
2025.05
37.02
2025.05
34.17
2025.05
34.05
2025.05
33.74
2025.05
33.26
2025.05
33.16
2024.03
31.9
2025.05
31.51
2025.05
31.17
2025.05
28.11
2025.05
27.19
2025.05
27.12
2025.05
26.59
2024.10
25.6