Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

nine-benchmark suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Zero-shot Language Understanding and Reasoningnine-benchmark suite (MMLU, ARC, CSQA, HellaSwag, OpenBookQA, PIQA, SocialIQA, WinoGrande) (test val)
MMLU Accuracy31.7
6
Showing 1 of 1 rows