Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Downstream Reasoning Benchmarks

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningDownstream Reasoning Benchmarks (MATH, GSM8K, AQUA, AIME, AMC, MMLU, GPQA)
Average Accuracy82.15
18
Showing 1 of 1 rows