Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (Avg@3)
Loading...
89.6
Avg@3 Score
ADORA
18.464
36.932
55.4
73.868
Feb 10, 2026
Avg@3 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg@3 Score
ADORA
Backbone=Qwen2.5-7B, T...
2026.02
89.6
GRPO
Backbone=Qwen2.5-7B, S...
2026.02
89.1
ADORA
Backbone=DeepSeek-Math...
2026.02
68.5
GRPO
Backbone=DeepSeek-Math...
2026.02
68.2
ADORA
Backbone=Llama-3.1-8B,...
2026.02
66.7
GRPO
Backbone=Llama-3.1-8B,...
2026.02
66.1
Qwen2.5-7B
Training Method=Base,...
2026.02
56.3
GRPO
Backbone=Mistral-v0.1-...
2026.02
54
ADORA
Backbone=Mistral-v0.1-...
2026.02
53.8
Llama-3.1-8B
Training Method=Base,...
2026.02
40.2
DeepSeek-Math-7B
Training Method=Base,...
2026.02
28.4
Mistral-v0.1-7B
Training Method=Base,...
2026.02
21.2
Feedback
Search any
task
Search any
task