Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Standard Downstream Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero/Few-shot Language ModelingStandard Downstream Tasks (arc-c, arc-e, boolq, hellaswag, piqa, siqa, winogrande)
ARC-C Accuracy70.65
55
Zero-shot Reasoning and Question AnsweringStandard Downstream Tasks PIQA, HellaSwag, Winogrande, ARC-Challenge, ARC-Easy
PIQA Zero-Shot Accuracy77.47
9
Language UnderstandingStandard Downstream Tasks (ARC, COPA, BoolQ, PIQA, StoryCloze, RTE, MMLU)
ARC (Challenge)49.57
8
Showing 3 of 3 rows