Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on Minerva Math Dataset (avg.@8)

43.66Average Accuracy @8

SFT

25.158429.961734.76539.5683Aug 25, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.08
43.66
2025.08
43.38
2025.08
43.33
2025.08
42.19
2025.08
40.53
2025.08
33.64
2025.08
32.4
2025.08
32.17
2025.08
26.75
2025.08
25.87