Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on GSM

94Accuracy

QwenPRM

-2.40822.62147.6572.679May 16, 2023Nov 17, 2023May 20, 2024Nov 22, 2024May 26, 2025Nov 27, 2025Jun 1, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.03
94
2026.03
93.8
2026.03
93.7
2026.03
93.3
2025.05
93
2025.05
92.9
2023.06
92.5
2025.05
92.1
2026.03
92
2025.05
91.7
2025.05
91.3
91.2
90.9
2025.05
90.8
89.9
89.5
2025.05
89.1
2025.05
88.4
2023.05
84.8
2023.05
82.4
2025.05
77.4
2023.05
77.3
2025.05
76.9
2023.06
76
2025.02
75.5
2025.02
73.4
73.3
2025.02
72.9
2023.05
72.7
2025.02
72.1
2023.05
71.8
2025.02
71.8
2025.02
70.4
2025.02
69.8
2025.02
69.3
2025.05
69.1
2025.02
67.5
2025.05
66.7
2023.05
62.7
2023.06
60
2023.06
59
2023.06
59
2026.06
54.97
2026.06
53.37
2026.06
53.3
2026.06
53.29
2023.06
53
2023.06
53
2026.06
51
2026.06
50.19
2023.06
50
2026.06
47
2026.06
46.3
2026.06
45.94
2026.06
44.88
2023.06
40.5
2026.06
38.69
2023.06
37
2023.06
36
2025.02
29.9
2025.02
28.5
2025.02
28.1
2023.06
25
2023.06
25
2023.05
22.2
2026.06
15.85
2023.06
14.5
2023.06
12
2023.06
10
2026.06
1.3