Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ARC-Easy, ARC-Challenge, OpenBookQA, WinoGrande, PIQA, HellaSwag, MathQA, RTE, BoolQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot evaluationARC-Easy, ARC-Challenge, OpenBookQA, WinoGrande, PIQA, HellaSwag, MathQA, RTE, BoolQ zero-shot
Mean Accuracy71.08
59
Showing 1 of 1 rows