Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GSM-Infinite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGSM-Infinite (Avg)
Accuracy17.1
24
Mathematical ReasoningGSM-Infinite 32K
Accuracy15.4
24
Mathematical ReasoningGSM-Infinite 16K
Accuracy16.2
24
Mathematical ReasoningGSM-Infinite (8K)
Accuracy22.9
24
Mathematical ReasoningGSM-Infinite
Accuracy (8K)22.9
17
Mathematical ReasoningGSM-Infinite Hard
Accuracy50.4
16
Long-Context Mathematical ReasoningGSM-Infinite
Accuracy87.06
11
Showing 7 of 7 rows