Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ARC, HellaSwag, LAMBADA, PIQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Language UnderstandingARC-Easy, ARC-Challenge, HellaSwag, LAMBADA, PIQA lm-eval 0.4.11 (test)
Average Accuracy81.5
42
Showing 1 of 1 rows