Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning and Code Generation Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Reasoning and Code GenerationReasoning and Code Generation Suite (MATH, GSM8K, MBPP, TheoremQA, BBH) (test)
MATH Accuracy54.38
6
Showing 1 of 1 rows