Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HuggingFace Open LLM Leaderboard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Large Language Model EvaluationHuggingFace Open LLM Leaderboard
GSM8K55.37
49
Large Language Model EvaluationHuggingFace Open LLM Leaderboard lm-eval-harness default (various)
HellaSwag84.34
36
General language understanding and reasoningHuggingface Open LLM Leaderboard
HellaSwag Accuracy85.32
30
LLM EvaluationHuggingFace Open LLM Leaderboard Old (test)
GSM8K Score92.08
14
General Language UnderstandingHuggingFace Open LLM Leaderboard New
BBH68.84
7
Showing 5 of 5 rows