Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Arithmetic Reasoning on AQuA, GSM8K, MAWPS, SVAMP

62.2AQuA Accuracy

Qwen2.5-14B

-1.55214.99931.5548.101Jun 3, 2025Jul 7, 2025Aug 10, 2025Sep 14, 2025Oct 18, 2025Nov 21, 2025Dec 26, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
62.278.592.487.480.1
2025.12
60.27492.485.378
2025.12
59.878.194.188.880.2
2025.12
56.875.19286.177.5
2025.12
43.764.489.778.669.1
2025.12
42.363.789.577.468.2
2025.12
31.949.784.965.858.1
2025.06
27.644.588.259.955.1
2025.06
26.854.88769.459.5
2025.06
26.442.787.358.753.7
2025.06
26.444.189.55954.8
2025.06
2643.285.75953.5
2025.06
2655.589.966.959.6
2025.06
25.653.285.367.557.9
2025.06
25.655.387.867.459
2025.06
25.341.786.35652.3
2025.06
24.838.681.160.551.2
2025.06
24.34386.158.452.9
2025.06
2432.18751.548.7
2025.06
23.931.283.94947.3
2025.06
23.954.987.766.558.3
2025.06
23.631.982.549.646.9
2025.06
23.443.185.159.252.7
2025.06
23.229.583.645.845.6
2025.06
23.241.787.45852.6
2025.06
22.833.38450.947.8
2025.06
22.121.776.63939.9
2025.06
21.73787.755.950.6
2025.06
20.530.986.646.946.2
2025.06
5.10.90.81.52.1
2025.06
0.90.50.10.70.6