Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Understanding on MMLU (During-task and Post-Switch Accuracy)

90.8During-task Accuracy

Cloud LLM Cluster

53.77663.3887382.612Jan 29, 2026
Updated 4d ago

Evaluation Results

MethodLinks
90.890.8
2026.01
69.563
2026.01
67.762.4
2026.01
66.759.5
2026.01
65.461.8
2026.01
64.860.7
2026.01
64.859.1
2026.01
64.263.2
2026.01
63.160.6
2026.01
63.158.5
2026.01
6361.3
2026.01
61.560.1
2026.01
61.556.3
2026.01
61.260.3
2026.01
61.159.6
2026.01
59.855.2
2026.01
59.556.2
2026.01
59.257.3
2026.01
58.955.6
2026.01
58.955.3
2026.01
58.255.2
2026.01
57.956.9
2026.01
57.956.9
2026.01
57.556.4
2026.01
57.354
2026.01
56.351.2
2026.01
55.451.8
2026.01
55.451.8
2026.01
55.252