Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GSM8K, CommonSense, BoolQ, ARC, and HellaSwag

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and MathGSM8K, CommonSense, BoolQ, ARC Challenge, and HellaSwag
Average Accuracy87.8
9
Showing 1 of 1 rows