Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Overall LLM Evaluation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language UnderstandingOverall LLM Evaluation Suite PiQA, ARC, HellaSwag, WinoGrande, MMLU v1
Overall Accuracy74.6
16
Showing 1 of 1 rows