Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning Process Evaluation on ProcessBench (test)

96.9GSM8K Accuracy

Qwen2.5-Math-7B-MathShepherd-r

22.22841.6146180.386Jul 21, 2025Sep 4, 2025Oct 20, 2025Dec 4, 2025Jan 19, 2026Mar 5, 2026Apr 20, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.04
96.951.767.42096.833.27.395.913.5496.77.630.4
2026.04
96.47282.46890.477.655.785.567.555.28366.373.5
2026.04
95.973.483.268.289.277.357.981.167.657.678.866.573.6
2026.04
95.353.168.24890.162.635.787.350.729.886.144.356.5
2026.04
93.85871.734.788.949.919.281.131.113.679.723.244
2026.04
91.758.571.453.955.454.646.327.134.242.732.436.849.3
2026.04
91.245.961.14658.151.343.731.336.542.635.338.646.9
2026.04
90.267.677.367.273.470.151.753.152.450.652.751.662.9
2026.04
89.65870.460.874.466.948.4565249.757.353.260.6
2025.07
87.9-88-78.578.7-59.257.8-61.161.371.5
2025.07
83.7-82.9-63.759.4-54.346.7-514358
2026.04
82.961.870.843.862.253.617.931.922.91441.92142.1
2025.07
80.2-79.2-63.463.6-50.151.4-50.153.561.9
2025.07
77.9-76.2-65.461.8-59.854.6-55.152.261.2
2025.07
75.3-74.9-52.648.2-5046.7-43.24152.7
2025.07
73.5-68.2-65.162.6-53.250.7-43.444.356.5
2025.07
72.3-69.3-59.253.3-50.245-43.541.352.2
2025.07
72-68.6-67.367.7-54.656-47.851.360.9
2025.07
72-68.9-64.560.1-5748.9-52.546.356.1
2025.07
71.6-70.8-54.553.6-25.622.9-23.72142.1
2025.07
70.6-65.6-61.953.1-53.540-47.738.349.3
2025.07
70.3-65.8-59.652.1-56.132.5-55.131.745.5
2025.07
67.8-67.6-52.349.2-43.342.1-39.340.249.8
2025.07
62.4-52.2-48.322.8-46.221.2-44.82029.1
2025.07
62.3-50.4-42.133.4-22.313.8-19.115.828.4
2025.07
61.9-50.1-54.239.9-51.434-55.627.337.8
2025.07
59.9-59-49.148-20.519.3-19.719.236.4
2025.07
58.3-47.9-45.129.5-39.724.8-34.823.831.5
2025.07
56.9-38.8-45.133.8-26.516.9-23.216.926.6
2025.07
54.4-26.8-50.325.7-43.114.2-41.612.719.9
2025.07
49.1-14.3-46.36.5-47.24.1-48.91.86.7
2025.07
37.8-36.5-36.936.6-29.929.7-27.327.432.6
2025.07
27.3-10.9-20.55.1-162.8-151.65.1
2025.07
27.1-13.1-17.313.8-14.24.8-19.712.611.1
2025.07
25.1-8.4-20.419-16.114.7-13.812.113.6