Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-subject Reasoning on MMLU-Pro
Loading...
65.6
Acc (Mean)
Gemma-3-12B-Instruct
30.552
39.651
48.75
57.849
Dec 2, 2025
Acc (Mean)
Acc Var
Consistency Score
Output Tokens Variance
Updated 3mo ago
Evaluation Results
Method
Method
Links
Acc (Mean)
Acc Var
Consistency Score
Output Tokens Variance
Gemma-3-12B-Instruct
Setting=Optimized
2025.12
65.6
0.0128
0.333
102,069.6
Gemma-3-12B-Instruct
Setting=Random
2025.12
50.1
0.008
0.263
114,524.53
Qwen2.5-7B-Instruct
Setting=Optimized
2025.12
49.7
0.018
0.174
128,362.8
Qwen2.5-7B-Instruct
Setting=Random
2025.12
39.9
0.015
0.114
372,446.18
Llama-3.1-8B-Instruct
Setting=Optimized
2025.12
37.3
0.011
0.155
513,244.2
Llama-3.1-8B-Instruct
Setting=Random
2025.12
31.9
0.01
0.109
822,103.05
Feedback
Search any
task
Search any
task