Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ARC Challenge, BoolQ, OpenbookQA, GSM8K, MMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Downstream Task EvaluationARC Challenge, BoolQ, OpenbookQA, GSM8K (Strict), MMLU
ARC Challenge Accuracy66.72
5
Showing 1 of 1 rows