Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU, ARC-C, PIQA, WinoG, GSM8K, HellaSwag, GPQA, RACE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot language evaluationMMLU, ARC-C, PIQA, WinoG, GSM8K, HellaSwag, GPQA, RACE zero-shot
Average Score60.94
9
Showing 1 of 1 rows