Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GSM-8K

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM-8K
Accuracy97.3
57
Mathematical ReasoningGSM-8K
GSM Accuracy84.8
18
Mathematical ReasoningGSM-8K
Accuracy89.11
16
Overrefusal EvaluationGSM-8k
RR0
6
Mathematical ReasoningGSM-8K
Accuracy95.6
2
Showing 5 of 5 rows