Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TASK Suite Wikitext Lambada Triviaqa PIQA Hellaswag Winogrande ARC EASY GPQA Social IQA Openbookqa SCIQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language Modeling and Question AnsweringShort-context task suite (WikiText, LAMBADA, TriviaQA, PIQA, HellaSwag, WinoGrande, ARC-Easy, GPQA, Social IQA, OpenBookQA, SciQ) (test)
WikiText PPL14.4
18
Showing 1 of 1 rows