Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HuggingFace Open LLM Leaderboard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Large Language Model EvaluationHuggingFace Open LLM Leaderboard
GSM8K55.37
49
General language understanding and reasoningHuggingface Open LLM Leaderboard
HellaSwag Accuracy62
20
Large Language Model EvaluationHuggingFace Open LLM Leaderboard lm-eval-harness default (various)
HellaSwag84.34
18
Showing 3 of 3 rows