Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on Math ID GSM8k ProofNet

99.2GSM8k Accuracy

Qwen3-8B pass@N (Upper Bound)

68.93676.79384.6592.507Nov 9, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
99.299.397.8-
2025.11
97.893.776-
2025.11
97.892.776.5-
2025.11
97.693.476-
2025.11
97.6---
2025.11
97.694.476.5-
2025.11
97.595.776-
2025.11
97.594.473.6-
2025.11
97.392.774.4-
2025.11
96.492.771.7-
2025.11
96.393.771.7-
2025.11
96.194.478.5-
2025.11
96.192.774.1-
95.692.474.1-
2025.11
95.59372.8-
2025.11
95.191.771.4-
2025.11
93.383.177.484.6
2025.11
85.462.652.967
2025.11
8567.35769.8
2025.11
8562.257.968.4
2025.11
84.667.356.269.4
2025.11
8455.652.964.2
2025.11
8455.652.964.2
2025.11
83.662.247.164.3
2025.11
82.864.949.665.8
2025.11
82.462.648.864.6
2025.11
78.363.752.965
2025.11
70.154.647.757.5