Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM Benchmark Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language UnderstandingLLM Benchmark Suite (MMLU, ARC-C, PIQA, WinoG, GSM8K, HellaSwag, GPQA, RACE) (test)
Overall Accuracy57.93
13
Showing 1 of 1 rows