Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU, GSM8k, HellaSwag, WinoGrande

Benchmarks

Task NameDataset NameSOTA ResultTrend
Large Language Model EvaluationMMLU, GSM8k, HellaSwag, WinoGrande
Average Score78.9
58
Language Modeling EvaluationMMLU, GSM8k, HellaSwag, WinoGrande
MMLU Accuracy72.98
17
Natural Language Understanding and Mathematical ReasoningMMLU, GSM8k, HellaSwag, WinoGrande (test)
MMLU Accuracy77.18
13
Large Language Model EvaluationMMLU, GSM8k, HellaSwag, WinoGrande (test)
MMLU Accuracy86.55
13
Showing 4 of 4 rows