Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Language Modeling and Reasoning on MMLU-CoT, GSM8k, HellaSwag, and WinoGrande

72.76MMLU-CoT Accuracy

FP16

48.94455.12761.3167.493Sep 27, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
72.7685.0680.0177.978.93100
2025.09
72.584.880.277.478.7399.74
2025.09
72.484.48077.378.5399.48
2025.09
72.484.779.877.778.6599.64
2025.09
71.884.579.978.178.5899.55
2025.09
69.7182.2679.1475.5376.6697.12
2025.09
69.1280.878.1775.2475.8496.08
2025.09
68.979.579.574.775.795.9
2025.09
68.8582.678.2674.5175.7295.92
2025.09
68.6981.5877.5973.475.3295.42
2025.09
68.5981.7378.3874.2775.7495.96
2025.09
68.5678.1778.6475.1475.1395.18
2025.09
68.379.6177.673.4874.7594.71
2025.09
68.2678.3978.1574.1174.7394.67
2025.09
67.4178.0177.3173.4874.0593.82
2025.09
67.1975.776.9174.873.6593.31
2025.09
66.577.477.2575.1474.193.8
2025.09
66.576.176.9675.3273.793.4
2025.09
66.3676.6577.3872.4873.2192.75
2025.09
65.9674.6877.6274.1973.1192.63
2025.09
65.4874.8376.6373.0972.5191.86
2025.09
63.9368.5475.173.5670.389.06
2025.09
63.4968.4676.0174.5170.6289.47
2025.09
62.3872.4875.2971.6770.4589.26
2025.09
62.2167.8573.9973.2469.3287.83
2025.09
61.868.1674.8772.9369.488
2025.09
61.2267.775.0471.6768.9187.3
2025.09
58.4461.6473.9471.1966.384
2025.09
55.0656.7972.0668.2763.0579.87
2025.09
49.8656.9473.571.4362.979.7