Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH 500 (Accuracy Variance, Consistency, Token Variance)
Loading...
79.7
Mean Accuracy
Gemma-3-12B-Instruct
22.708
37.504
52.3
67.096
Dec 2, 2025
Mean Accuracy
Accuracy Variance
Consistency Score
Output Token Variance
Updated 3mo ago
Evaluation Results
Method
Method
Links
Mean Accuracy
Accuracy Variance
Consistency Score
Output Token Variance
Gemma-3-12B-Instruct
Setting=Optimized
2025.12
79.7
0.0016
55.4
144,471.24
Qwen2.5-7B-Instruct
Setting=Optimized
2025.12
68.6
0.007
35.4
122,866.7
Gemma-3-12B-Instruct
Setting=Random
2025.12
67
0.002
49.5
153,144.84
Qwen2.5-7B-Instruct
Setting=Random
2025.12
58.5
0.007
35.4
305,133.92
Llama-3.1-8B-Instruct
Setting=Optimized
2025.12
28.5
0.026
1.1
1,547,849
Llama-3.1-8B-Instruct
Setting=Random
2025.12
24.9
0.015
2.7
1,512,173.79
Feedback
Search any
task
Search any
task