Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense and Reading Comprehension Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning7-benchmark commonsense and reading-comprehension suite (ARC-Easy, ARC-Challenge, HellaSwag, WinoGrande, PIQA, BoolQ, and OpenBookQA) LM Evaluation Harness default (test)
Accuracy68.77
108
Showing 1 of 1 rows