Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Modeling Evaluation on MMLU, GSM8k, HellaSwag, WinoGrande

72.98MMLU Accuracy

FP16

67.041668.583370.12571.6667Sep 27, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
72.9890.975.5270.5677.49-
2025.09
71.3489.2375.2470.476.5598.79
2025.09
71.0688.3274.5868.0375.597.43
2025.09
70.9489.0874.6768.5175.897.82
2025.09
70.988.1775.0170.0976.0498.13
2025.09
70.7890.374.6370.7276.6198.86
2025.09
70.4587.4174.2568.975.2597.11
2025.09
70.3589.6174.6170.5676.2898.44
2025.09
70.1986.3573.0268.1174.4296.04
2025.09
69.5386.4373.5565.7573.8295.26
2025.09
69.4587.3474.0369.8575.1797
2025.09
69.1384.8473.1768.0373.7995.23
2025.09
69.0986.6673.4767.9674.395.88
2025.09
68.0184.2371.6567.872.9294.11
2025.09
67.6984.2371.2467.472.6493.74
2025.09
67.5783.7871.3267.3272.593.56
2025.09
67.2781.5871.4166.3871.6692.48