Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ARC, TruthfulQA, Winogrande, GSM8K, HellaSwag, MMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Large Language Model EvaluationARC, TruthfulQA, Winogrande, GSM8K, HellaSwag, MMLU
ARC Accuracy73.7
16
Showing 1 of 1 rows