Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Pass@1 Accuracy)
Loading...
96.4
GSM8K Pass@1 Accuracy
Claude3.5-Sonnet
86
88.7
91.4
94.1
Feb 18, 2025
GSM8K Pass@1 Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
GSM8K Pass@1 Accuracy
Claude3.5-Sonnet
Decoding=Greedy
2025.02
96.4
GPT-o1-mini
Decoding=Greedy
2025.02
94.8
GPT-4o
Decoding=Greedy
2025.02
92.9
Qwen2.5-Math-7B-S2R-ORL
Backbone=Qwen2.5-Math-...
2025.02
92.9
Llama-3.1-8B-S2R-ORL
Backbone=Llama-3.1-8B,...
2025.02
87.3
Qwen2-7B-S2R-ORL
Backbone=Qwen2-7B, Tra...
2025.02
86.4
Feedback
Search any
task
Search any
task