Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot General Evaluation on HellaSwag, MathQA, MMLU, OpenBookQA, WinoGrande, GSM8K, and HumanEval

82.72HellaSwag Accuracy

FP16

60.588866.334472.0877.8256May 22, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
82.7263.8584.5344.276.877.1895.7375
2026.05
82.1761.3483.4945.276.0976.1992.6873.88
2026.05
82.1161.1783.8145.677.0376.5792.6874.14
2026.05
81.4259.482.5244.276.0976.6592.0773.19
2026.05
81.3560.0378.774572.8583.4756.168.22
2026.05
80.9362.7883.9444.876.475.9794.5174.19
2026.05
79.9254.0775.8343.472.1479.4538.4163.32
2026.05
79.8157.0576.4541.870.6482.6456.166.36
2026.05
79.5557.6279.9743.871.3580.8253.0566.59
2026.05
79.2460.1776.9844.874.1985.3750.6167.34
2026.05
78.7350.9579.6843.870.466.4991.4668.79
2026.05
78.6949.7579.7643.871.7471.4991.4669.53
2026.05
78.5549.0175.3744.471.6779.5343.2963.12
2026.05
78.0260.6781.4744.875.8571.4992.6872.14
2026.05
77.739.0355.644.470.8839.1226.8350.51
2026.05
76.8336.4553.1644.470.8832.1521.3447.89
2026.05
76.437.2953.584370.5634.5720.1247.93
2026.05
76.1538.3954.6443.670.1733.662548.8
2026.05
75.8837.2950.9244.269.330.426.2247.74
2026.05
75.0638.1653.5943.270.8830.3327.4448.38
2026.05
74.0952.770.8743.472.9375.5143.961.91
2026.05
70.2427.4754.6338.665.5918.351.2239.44
2026.05
70.1624.8939.1739.460.624.32034.08
2026.05
69.9633.3746.4139.268.8215.4714.0241.04
2026.05
69.4230.8241.837.265.5911.378.5437.82
2026.05
67.7329.5143.4138.263.5412.4311.5938.06
2026.05
66.4445.0970.024067.7249.3626.8352.21
2026.05
66.2532.1946.2939.669.8515.8512.840.4
2026.05
63.0533.1748.8236.460.8522.6711.5939.51
2026.05
62.8231.8344.5135.459.9819.647.9337.44
2026.05
61.4425.0927.7235.859.982.96030.43