Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

STEM Reasoning on MMLU-Redux 2.0

97.77Pass@1 Accuracy

Qwen3-30B-A3B (Thinking)

88.09890.60993.1295.631Apr 10, 2026
Updated 5d ago

Evaluation Results

MethodLinks
2026.04
97.77
2026.04
96.85
2026.04
96.4
2026.04
95.94
2026.04
94.1
2026.04
93.32
2026.04
92.92
2026.04
88.47