Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical and General Reasoning on DeepMATH (test)
Loading...
83.4
MATH 500 Score
BF16
56.776
63.688
70.6
77.512
Jan 20, 2026
MATH 500 Score
GSM8k Score
GPQA Score
SuperGPQA Score
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
MATH 500 Score
GSM8k Score
GPQA Score
SuperGPQA Score
Average Score
BF16
Model=Qwen3-8B-Base, R...
2026.01
83.4
-
44.2
36.1
54.6
Jet-RL
Model=Qwen3-8B-Base, R...
2026.01
80.2
-
47.2
33.8
53.7
Before Tuning
Model=Qwen3-8B-Base, R...
2026.01
69.7
63.4
46.2
31.8
52.8
BF16-Train-FP8-Rollout
Model=Qwen3-8B-Base, R...
2026.01
57.8
-
42.6
32.6
44.3
Feedback
Search any
task
Search any
task