Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ARC-e, BoolQ, HellaSwag, LAMBADA, PIQA, RACE, SocialIQA, SciQ, SWAG

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Downstream Task EvaluationARC-e, BoolQ, HellaSwag, LAMBADA, PIQA, RACE, SocialIQA, SciQ, SWAG
ARC-e Accuracy77.9
12
Showing 1 of 1 rows