Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense and Reading Comprehension Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning7-benchmark commonsense and reading-comprehension suite (ARC-Easy, ARC-Challenge, HellaSwag, WinoGrande, PIQA, BoolQ, and OpenBookQA) LM Evaluation Harness default (test)
Accuracy68.77
108
Showing 1 of 1 rows