Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Downstream Reasoning Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningDownstream Reasoning Benchmarks (MATH, GSM8K, AQUA, AIME, AMC, MMLU, GPQA)
Average Accuracy82.15
18
Showing 1 of 1 rows