Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-task Language Understanding on CMMLU

89.28Accuracy

BF16

23.884840.862457.8474.8176Mar 7, 2024Jul 2, 2024Oct 28, 2024Feb 23, 2025Jun 20, 2025Oct 16, 2025Feb 11, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
89.28-
2026.02
88.71-0.57
2026.02
87.76-1.52
2026.02
87.44-1.84
2024.03
83.7-
2026.02
82.741.09
2026.02
81.65-
2024.03
75.5-
2026.02
72.24-9.41
2026.02
72.12-9.53
2024.03
71-
2024.03
71-
2024.03
61.97-
2024.03
61.8-
2024.03
59-
2024.03
58-
2024.03
55.5-
2024.03
53.3-
2025.02
45.7-
2025.02
35.6-
2025.02
27.5-
2025.02
26.4-