Share your thoughts, 1 month free Claude Pro on usSee more

STEM Reasoning on MMLU-Redux 2.0

97.77Pass@1 Accuracy

Qwen3-30B-A3B (Thinking)

Updated 1mo ago

Evaluation Results

Method	Links
Qwen3-30B-A3B (Thinking) 2026.04		97.77
Gemini 2.5 Flash 2026.04		96.85
GPT-5 Mini 2026.04		96.4
GPT-OSS-120B 2026.04		95.94
Nemotron 3 Nano 30B A3B 2026.04		94.1
GPT-OSS-20B 2026.04		93.32
Aryabhata 2 2026.04		92.92
GPT-5 Nano 2026.04		88.47