Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
STEM Reasoning on MMLU-Redux 2.0
Loading...
97.77
Pass@1 Accuracy
Qwen3-30B-A3B (Thinking)
88.098
90.609
93.12
95.631
Apr 10, 2026
Pass@1 Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
Qwen3-30B-A3B (Thinking)
Sampling Strategy=4-sa...
2026.04
97.77
Gemini 2.5 Flash
Sampling Strategy=4-sa...
2026.04
96.85
GPT-5 Mini
Sampling Strategy=4-sa...
2026.04
96.4
GPT-OSS-120B
Sampling Strategy=4-sa...
2026.04
95.94
Nemotron 3 Nano 30B A3B
Sampling Strategy=4-sa...
2026.04
94.1
GPT-OSS-20B
Sampling Strategy=4-sa...
2026.04
93.32
Aryabhata 2
Sampling Strategy=4-sa...
2026.04
92.92
GPT-5 Nano
Sampling Strategy=4-sa...
2026.04
88.47
Feedback
Search any
task
Search any
task