Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 2025 (Pass@k and Voting Metrics)

95.4Pass@1

Qwen3-235B-A22B-Thinking-2507

Updated 3mo ago

Evaluation Results

Method	Links
Qwen3-235B-A22B-Thinking-2507 2025.12		95.4	96.7	97.9	98.3	100
gpt-oss-120b 2025.12		92.3	93.3	95.1	96.7	96.7
DeepSeek-R1-0528 2025.12		87.1	88.3	87.5	90.8	96.7
QwQ-32B 2025.12		70	78.3	76.2	80	83.3
DeepSeek-R1-Distill-Qwen-32B 2025.12		55.9	63.8	66.6	68	76.7
DeepSeek-R1-Distill-Llama-70B 2025.12		44	50	56.1	57.8	70
DeepSeek-R1-Distill-Qwen-7B 2025.12		39.1	49.2	56.3	55.4	66.7