Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GSM8K, Math, AIME, HumanEval, LiveCodeBench, ARC, MMLU, GPQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and KnowledgeGSM8K, Math, AIME, HumanEval, LiveCodeBench, ARC-C, ARC-E, MMLU, GPQA
GSM8K95.41
9
Showing 1 of 1 rows