Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical and General Reasoning on GSM8k + MATH 1.0 (test)
Loading...
94.7
GSM8k Accuracy
BF16
10.252
32.176
54.1
76.024
Jan 20, 2026
GSM8k Accuracy
MATH 500 Accuracy
GPQA Accuracy
SuperGPQA Accuracy
Average Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
GSM8k Accuracy
MATH 500 Accuracy
GPQA Accuracy
SuperGPQA Accuracy
Average Accuracy
BF16
Model=Qwen3-8B-Base, R...
2026.01
94.7
81.7
46.2
33.5
64
Jet-RL
Model=Qwen3-8B-Base, R...
2026.01
92.1
76.7
43.3
33.2
61.3
BF16
Model=Qwen2.5-7B, Roll...
2026.01
91.5
72.8
28.4
42.2
58.7
BF16-Train-FP8-Rollout
Model=Qwen2.5-7B, Roll...
2026.01
90.7
68.1
28.9
26.9
53.7
Jet-RL
Model=Qwen2.5-7B, Roll...
2026.01
89.3
69.2
36.5
27.8
55.7
Before Tuning
Model=Qwen3-8B-Base, R...
2026.01
63.7
69.7
45.7
30.7
52.5
Before Tuning
Model=Qwen2.5-7B, Roll...
2026.01
13.5
45.3
28.4
26
28.3
Feedback
Search any
task
Search any
task