Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
STEM Reasoning on MMLU-Pro
Loading...
90.8
Pass@1 Accuracy
Qwen3-30B-A3B (Thinking)
80.2128
82.9614
85.71
88.4586
Apr 10, 2026
Pass@1 Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
Qwen3-30B-A3B (Thinking)
Sampling Strategy=4-sa...
2026.04
90.8
Gemini 2.5 Flash
Sampling Strategy=4-sa...
2026.04
90.44
GPT-OSS-120B
Sampling Strategy=4-sa...
2026.04
90.11
GPT-5 Mini
Sampling Strategy=4-sa...
2026.04
89.64
Aryabhata 2
Sampling Strategy=4-sa...
2026.04
88.49
GPT-OSS-20B
Sampling Strategy=4-sa...
2026.04
85.42
Nemotron 3 Nano 30B A3B
Sampling Strategy=4-sa...
2026.04
84.33
GPT-5 Nano
Sampling Strategy=4-sa...
2026.04
80.62
Feedback
Search any
task
Search any
task