Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GPT-3 Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language UnderstandingGPT-3 Evaluation Suite (11 tasks: HellaSwag, LAMBADA, TriviaQA, WebQs, Winogrande, PIQA, ARC Challenge, ARC Easy, ANLI R1, ANLI R2, ANLI R3) zero-shot
Average Accuracy33.6
4
Few-shot Language UnderstandingGPT-3 Evaluation Suite Few-shot
Accuracy48.1
3
Zero-shot EvaluationGPT-3 Evaluation Suite (LAMBADA, TriviaQA, WebQs, PIQA, RACE-h, BoolQ) 1.3B various (test val)
Overall Accuracy44.4
3
Showing 3 of 3 rows