Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MGSM (test)

88Accuracy (ZH)

self-cons

5.21626.70848.269.692Oct 16, 2025Nov 22, 2025Dec 30, 2025Feb 6, 2026Mar 16, 2026Apr 23, 2026May 31, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
88----88.4-88.487.865.689.282.280.849.681.676.944.6
2026.01
86----91.288.6----------
2026.05
86----87.6-66.881.25471.265.251.650.869.865.738.6
2026.01
85.6----89.687.6----------
2026.05
85.2----90.4-89.286.860.486.480.480.8547675.640.4
2026.01
84.8----9087.4----------
2026.01
84.4----92.488.4----------
2026.01
84----92.488.2----------
2026.05
82.3----84.2-49.270.433.46657.353.241.470.257.626.2
2026.05
80.8----84.2-4475.237.661.648.847.240.46655.221.8
2026.01
79.6----88.484----------
2026.01
79.6----90.485----------
2026.01
75.6----88.482----------
2026.01
74.8----87.681.2----------
2026.01
74.8----88.881.8----------
2026.01
74----8278----------
2026.03
71.6----83.6-7477.263.274.472.865.267.668.471.8-
2026.03
70.4----83.2-72.47660.872.465.659.266.86469.08-
2025.10
70.3----79.2-73.97569.669.369.869.161.972.571.1-
2026.03
70----83.6-77.276.460.471.673.664.865.266.470.92-
2026.03
68.8----83.2-74.478.465.675.27462.866.865.671.48-
2026.05
68.4----82.4-69.674.864.476.47257.259.241.665.5-
2025.10
67.9----78.9-72.575.166.47169.364.559.169.369.4-
2026.03
67.6----66.4-6059.649.66458.454.448.462.859.12-
2026.03
62.8----74.4-63.262.443.267.660.459.658.44459.6-
2026.05
60.4----88.2-72.466.45063.67063.28.864.455.915.6
2026.05
60----79.6-55.260.451.268.866.460.425.25654.617.2
2026.05
59.6----81.8-7672.454.875.271.655.617.673.259.324
2026.05
59.6----71.6-59.362.844.859.263.65655.247.658-
2026.05
59.2----72-626447.25760.847.632.853.655.6-
2025.12
56.8--62.1---60.469.2--------
2026.03
56.4----65.2-60.46247.259.26250.849.654.456.72-
2026.05
55.6----57.6-55.654.418525439.68.437.239.96.4
2026.03
55.2----62.8-58.461.24656.857.246.449.651.254.48-
2025.10
54.8----64.8-62.459.629.659.658.4522835.250.4-
2025.10
54.4----71.2-60.4624660.45652.448.43654.7-
2026.05
54----50.3-55.263.221.253.656.4365.24840.97.2
2026.05
54----71.6-6262.838.450.859.238.430.847.251.5-
2025.10
53.6----66.8-60.858.450.461.257.654.457.252.857.3-
2026.05
53.1----66.8-59.161.141.660.160.857.350.952.151.5-
2026.03
52.8----67.2-57.660.848.454.857.247.246.445.653.8-
2026.03
52.8----62.4-56.860.4505657.65047.648.854.24-
2026.05
52.6----78.6-39.250.333.467.654.443.611.460.246.418.6
2026.03
52----66.8-566047.252.851.644.845.644.452.12-
2026.03
51.2----62.8-56.459.240.454.852.451.653.247.252.92-
2026.05
51.2----51.6-47.653.24249.249.639.6464647.6-
2026.03
50.8----60-55.256.849.654.855.6504846.452.72-
2026.03
50.8----57.2-50.448.4345045.642.843.633.245.6-
2025.12
50.4--58.3---58.466--------
2026.05
50.4----54.1-5057.217.247.652.832.8640.737.64.4
2026.05
49.6----73.2-58.864.442.458.857.240.831.635.651.2-
2026.03
49.2----54-45.254.838.846.444.438.840.84445.64-
2026.05
49.2----78.6-37.440.23060.448.249.819.854.844.218.2
2026.03
48.8----67.2-57.659.243.657.26045.649.242.453.08-
2026.05
48.8----53.2-47.64835.244.448.443.242.846.845.8-
2025.10
48.4----68-52.459.639.654.856.8444440.449.6-
2026.03
48----62-57.254.840.45654.851.646.835.250.68-
2025.10
46.8----65.5-47.659.648.460.456.449.237.637.650.6-
2026.03
46----53.2-49.247.634.849.240.845.243.232.444.16-
2026.03
46----49.2-46.848.841.645.248.44441.237.644.88-
2026.03
46----56.8-49.253.244.449.25047.244.837.647.84-
2025.10
45.2----63.2-52.45842.856.450.84043.250.450.2-
2026.05
45.2----54.8-48.445.233.243.63835.638.436.441.9-
2026.03
44----56.8-49.253.244.846.446.446.444.83446.6-
2026.05
42.8----46.8-3845.248.451.237.644.838.843.243.7-
2026.03
42----50.8-46.447.633.652.843.646.438.839.244.12-
2025.10
42----52-45.24833.245.244.842424043.4-
2026.05
41.2----52.4-46.548.81241.248.131.62.834.633.14.4
2025.12
40.8--45.1---45.249.2--------
2026.05
40.8----52.4-38.85014.444.242.123.68.436.833.412.4
2026.03
40.4----48-484840.849.64642.442.433.643.92-
2025.10
38.4----68.8-5257.26.855.254.436.46.87.238.3-
2026.03
36----48-39.639.63040.439.232.435.232.837.32-
2025.12
34.4--40.5---41.645.6--------
2025.12
33.2--37.6---37.642--------
2026.05
31.6----52-33.639.218.83836.427.223.621.632.2-
2025.12
21.6--26.5---21.636.4--------
2025.12
20.8--28.7---28.436.8--------
2025.12
12.8--13.9---10.418.4--------
2025.12
8.4--10.1---4.417.6--------
2023.05
-72---------------
2023.05
-45.957.9--------------
2023.05
-72.287--------------
2023.05
-75.985.8--------------
2024.10
---33-------------
2024.10
---38-------------
2024.10
---7-------------
2024.10
---42-------------
2024.10
---3-------------
2025.02
---39.2-------------
2025.02
---56.6-------------
2025.02
---38.6-------------
2025.02
---46.7-------------
2025.02
---61.1-------------
2025.02
---44.6-------------
2025.02
---57.2-------------
2025.02
---71.2-------------
2025.02
---49-------------
2025.02
---62.8-------------
2025.02
---75.6-------------
Showing 100 of 124 rows