Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Downstream Evaluation on LM Evaluation Harness v1.0.0

50.1HellaSwag Accuracy

Baseline FT

47.18847.94448.749.456May 13, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
50.161.536.769.25770.345.25055
2026.05
5066.537.369.456.770.344.849.855.6
2026.05
49.460.235.368.857.970.342.548.454.1
2026.05
47.356.233.967.455.963.840.34050.6