Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ARC, PIQA, OpenbookQA, Winogrande, HellaSwag, MathQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot ReasoningARC-e, PIQA, OpenbookQA, Winogrande, HellaSwag, MathQA
Average Accuracy57
19
Showing 1 of 1 rows