Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GSM8K, TruthfulQA, CommonsenseQA, MMLU, ARC, and TriviaQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Large Language Model EvaluationGSM8K, TruthfulQA, CommonsenseQA, MMLU, ARC, and TriviaQA (various)
Accuracy88
9
Showing 1 of 1 rows