Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Downstream Reasoning Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Question AnsweringDownstream Reasoning Tasks ARC-c, ARC-e, BoolQ, HellaSwag, MMLU, OpenBookQA, PIQA, Winogrande
ARC-c Accuracy (Zero-shot)58.4
15
Zero-shot ReasoningDownstream Reasoning Tasks (WikiText-2, ARC-e, ARC-c, BoolQ, PIQA, SIQA, HellaS., OBQA, Wino.)
WikiText-2 Acc11.78
6
Showing 2 of 2 rows