Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Massive Multitask Language Understanding on MMLU (Sub-category Performance)

82.7STEM Accuracy

LeanQuant

76.04477.77279.581.228May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.05
82.783.290.687.786.1
2026.05
82.683.290.887.786.1
2026.05
82.382.690.587.585.7
2026.05
76.777.489.385.782.3
2026.05
76.677.389.285.982.3
2026.05
76.377.289.385.282