Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Language Understanding on MMLUpro (test)

95.71Accuracy

SIGMA

11.771633.563355.35577.1467Jan 24, 2025Apr 14, 2025Jul 3, 2025Sep 21, 2025Dec 10, 2025Feb 28, 2026May 19, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.05
95.71
2026.05
95.31
2026.05
95.1
2026.05
94.87
2026.05
94.73
2026.05
93.42
2026.05
93.29
2026.05
91.43
2026.05
91.43
2026.05
91.32
2026.05
88.76
2026.05
88.69
2026.05
88.57
2026.05
87.58
2026.05
84.17
2025.01
74.8
2025.10
71
2025.10
70
2025.10
69
2025.10
65
2025.10
65
2025.10
64
2025.01
63.71
2025.10
61
2025.10
47
2025.10
47
2025.10
45
2025.10
41
2025.10
39
2025.10
39
2025.10
36
2025.01
31.84
2025.10
26
2025.10
25
2025.01
22.1
2025.10
15