Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GSM-Plus

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM-Plus
Accuracy67.5
66
Mathematical ReasoningGSM-Plus (test)
Accuracy68.8
50
Mathematical ReasoningGSM-Plus (mini)
Accuracy52.8
8
Math Word Problem SolvingGSM+ v1 (test)
Accuracy65.7
6
Showing 4 of 4 rows