Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Language Understanding on MMLU-Redux

95.2Accuracy

HieraMAS

50.781662.313373.84585.3767Feb 23, 2026Mar 8, 2026Mar 22, 2026Apr 5, 2026Apr 18, 2026May 2, 2026May 16, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.02
95.2-----
2026.02
94.4-----
2026.02
94.4-----
2026.02
93.6-----
2026.02
93.6-----
2026.02
92.8-----
2026.02
92.8-----
2026.02
92-----
2026.02
92-----
2026.02
92-----
2026.02
91.2-----
2026.02
91.2-----
2026.02
91.2-----
2026.03
90-----
2026.03
89.9-----
2026.03
89.8-----
2026.02
89.6-----
2026.02
89.6-----
2026.02
88.8-----
2026.02
88.8-----
2026.02
88.33-----
2026.02
88-----
2026.02
83.2-----
2026.02
82.4-----
2026.02
81.67-----
2026.03
76.71-----
2026.03
75.5-----
2026.03
75.2-----
2026.03
74.8-----
2026.03
74.21-----
2026.03
74.12-----
2026.03
73.93-----
2026.03
72.72-----
2026.03
71.35-----
2026.03
70.37-----
2026.03
70.1-----
2026.03
68.4-----
2026.05
67.82-----
2026.05
67.79-----
2026.05
62.40.016-98.40.9914,973
2026.05
62.40.0085099.2-5,240
2026.03
58.71-----
2026.03
57.58-----
2026.03
56.97-----
2026.03
56.73-----
2026.03
56.44-----
2026.03
55.59-----
2026.03
52.49-----