Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Understanding on MMLU (During-task and Post-Switch Accuracy)

90.8During-task Accuracy

Cloud LLM Cluster

53.77663.3887382.612Jan 29, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
90.890.8
2026.01
69.563
2026.01
67.762.4
2026.01
66.759.5
2026.01
65.461.8
2026.01
64.860.7
2026.01
64.859.1
2026.01
64.263.2
2026.01
63.160.6
2026.01
63.158.5
2026.01
6361.3
2026.01
61.560.1
2026.01
61.556.3
2026.01
61.260.3
2026.01
61.159.6
2026.01
59.855.2
2026.01
59.556.2
2026.01
59.257.3
2026.01
58.955.6
2026.01
58.955.3
2026.01
58.255.2
2026.01
57.956.9
2026.01
57.956.9
2026.01
57.556.4
2026.01
57.354
2026.01
56.351.2
2026.01
55.451.8
2026.01
55.451.8
2026.01
55.252