Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multitask Knowledge Evaluation on MMLU-Pro
Loading...
80
Pass@1
Qwen3
14.376
31.413
48.45
65.487
May 13, 2026
Pass@1
Updated 20d ago
Evaluation Results
Method
Method
Links
Pass@1
Qwen3
Model Category=Origina...
2026.05
80
Qwen3-8192
Model Category=Origina...
2026.05
76.5
LUFFY
Model Category=Off-Pol...
2026.05
50.1
TGPO-annealing
Model Category=On-Poli...
2026.05
50.1
TGPO
Model Category=On-Poli...
2026.05
48.9
TGPOR
Model Category=On-Poli...
2026.05
48.1
GRPO++
Model Category=On-Poli...
2026.05
46.9
KDRL
Model Category=On-Poli...
2026.05
46.9
SFT
Model Category=Off-Pol...
2026.05
44.9
Oat-Zero
Model Category=On-Poli...
2026.05
41.7
SimpleRL-Zero
Model Category=On-Poli...
2026.05
34.5
PRIME-Zero
Model Category=On-Poli...
2026.05
32.7
OP Distill
Model Category=On-Poli...
2026.05
23
Qwen2.5-Math-7B
Model Category=Origina...
2026.05
16.9
Feedback
Search any
task
Search any
task