Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Large Language Model Evaluation on HuggingFace Open LLM Leaderboard (lm-eval-harness default)

84.34HellaSwag

Teacher

64.278469.486774.69579.9033Jan 17, 2026Feb 4, 2026Feb 22, 2026Mar 12, 2026Mar 30, 2026Apr 17, 2026May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.01
84.3467.0679.7458.5180.5884.2375.74--
2026.01
82.4260.8465.2652.1678.3154.8765.64--
2026.01
82.2563.9266.6555.2279.2957.4767.42--
2026.01
81.9957.5965.4845.1977.4350.2762.99--
2026.01
81.0861.9266.7353.8679.0554.3166.16--
2026.05
80.97-74.2362.6674.0387.21-51.4664.33
2026.01
80.9460.9265.5851.7277.4250.6464.54--
2026.01
80.8760.9365.3951.9977.3550.9564.58--
2026.05
80.85-74.1362.6371.5186.8-51.7663.91
2026.01
80.3463.5774.2856.3775.7781.3471.95--
2026.05
79.71-67.9452.9376.6479.27-43.9960.92
2026.05
79.46-74.0854.8675.6379.38-41.2263.96
2026.05
79.42-68.254.5677.569.67-44.5460.49
2026.05
79.42-68.254.5677.569.67-44.5460.49
2026.05
79.38-68.4852.6776.6472.1-46.0360.41
2026.01
79.3657.6964.9650.3177.6650.1663.35--
2026.05
79.35-74.1764.7574.5183.47-57.366.72
2026.05
79.35-74.1764.7574.5183.47-57.366.72
2026.01
79.2458.1964.8251.7774.8250.1163.16--
2026.05
78.55-73.0952.2875.7774.22-40.360.58
2026.05
78.31-68.6747.3976.0946.62-30.557.59
2026.05
77.7-65.2952.0676.5676.79-34.9458.7
2026.05
72.53-70.1754.5568.8881.21-22.2663.64
2026.05
72.43-70.1854.4968.6981.2-20.8963.57
2026.05
69.97-69.9953.9167.6984.62-29.3961.95
2026.05
69.8-70.1253.0768.1979.45-25.8761.77
2026.01
67.340.6131.0846.3464.59.7243.26--
2026.01
66.3540.131.1341.7963.37.4341.68--
2026.01
66.2340.9231.4343.4964.349.1342.6--
2026.01
65.9539.5931.7341.1762.876.7841.35--
2026.05
65.62-70.0754.8567.5685.14-23.8457.84
2026.05
65.62-70.0754.8567.5685.14-23.8457.84
2026.01
65.5939.3331.8637.6662.756.8240.67--
2026.01
65.4639.7631.1941.7363.147.1241.4--
2026.01
65.0940.0231.1541.262.775.7741--
2026.01
65.0540.1631.1140.7262.896.7741.12--