Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-domain Knowledge and Reasoning on MMLU-Pro
Loading...
77.7
Accuracy
Qwen3
75.516
76.083
76.65
77.217
Apr 2, 2026
Accuracy
Average Output Tokens
Updated 16d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Output Tokens
Qwen3
Size=14B
2026.04
77.7
2,400
Apriel-Reasoner (Ours)
Size=15B
2026.04
77.3
1,900
Phi-4-reasoning
Size=14B
2026.04
77.1
3,400
Nemotron-Cascade
Size=14B
2026.04
76.8
3,600
Apriel-Base
Size=15B
2026.04
76.4
3,500
Apriel-Base + RLVR w/ LP
Size=15B, Length Penal...
2026.04
75.6
1,500
Feedback
Search any
task
Search any
task