Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GSM-Hard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Math reasoningGSM-Hard (test)
Accuracy55.94
30
Mathematical ReasoningGSM-Hard Out-of-Distribution (test)
Final Answer Accuracy71
5
Mathematical ReasoningGSM-Hard OOD
Base Accuracy11.4
2
Showing 3 of 3 rows