Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on GSM8K (Avg. & Avg.Q Metrics)

58.76GSM8K Accuracy

RAISE

41.96446.324550.68555.0455Apr 9, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.04
58.7662.7629.55
2025.04
55.9562.0412.99
2025.04
54.6661.480
2025.04
52.9960.69-18.13
2025.04
51.7959.51-45.1
2025.04
49.7159.87-36.97
2025.04
48.5659.69-40.99
2025.04
47.8459.23-51.49
2025.04
46.5559.82-38.06
2025.04
44.5858.3-72.84
2025.04
44.5858.34-71.92
2025.04
44.0357.49-91.42
2025.04
43.7557.98-80.25
2025.04
42.6157.12-100