Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM Performance Estimation on HellaSwag (test)

0.827MAE (%)

SparseEval

0.647161.861083.0754.28892Feb 8, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.8270.918
2026.02
0.9420.91
2026.02
0.9930.906
2026.02
1.210.89
2026.02
1.4770.857
2026.02
1.750.783
2026.02
1.8680.861
2026.02
1.9570.857
2026.02
1.9620.847
2026.02
1.9680.876
2026.02
1.9920.784
2026.02
2.0120.889
2026.02
2.2570.847
2026.02
2.3520.811
2026.02
2.4160.856
2026.02
2.6190.875
2026.02
2.7540.745
2026.02
3.2720.796
2026.02
3.5010.687
2026.02
5.3230.661