Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Problem-solving on GSM8K (Pass@1)
Loading...
91.6
Pass@1
Qwen-2.5-7B
14.848
34.774
54.7
74.626
Apr 17, 2025
Jun 20, 2025
Aug 23, 2025
Oct 26, 2025
Dec 29, 2025
Mar 3, 2026
May 7, 2026
Pass@1
Updated 26d ago
Evaluation Results
Method
Method
Links
Pass@1
Qwen-2.5-7B
Turkish Support=Advanc...
2026.01
91.6
R-Zero
Iteration=3
2026.05
91.49
R-Zero
Iteration=2
2026.05
91.37
R-Zero
Iteration=1
2026.05
90.65
VHG (Soft)
2026.05
90.61
Vanilla-GRPO
2026.05
90.17
Vanilla-GRPO
training_variant=w. SF...
2026.05
89.89
Llama-3.1-8B
Turkish Support=Limite...
2026.01
84.5
Qwen3-4B-Base
2026.05
73.91
Fine-tuned
CR=1, Backbone=LLaMA2-...
2025.04
63.96
IMPART
CR=32, Backbone=LLaMA2...
2025.04
60.2
DARE
CR=32, Backbone=LLaMA2...
2025.04
58.91
LowRank
CR=32, Backbone=LLaMA2...
2025.04
56.25
Mistral-7B-v0.3
Turkish Support=Modera...
2026.01
55
Backbone
CR=1, Backbone=LLaMA2-...
2025.04
17.8
Feedback
Search any
task
Search any
task