Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LLM Benchmark Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language UnderstandingLLM Benchmark Suite (MMLU, ARC-C, PIQA, WinoG, GSM8K, HellaSwag, GPQA, RACE) (test)
Overall Accuracy57.93
13
Showing 1 of 1 rows