Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on Minerva Math Dataset (avg.@8)

43.66Average Accuracy @8

SFT

Updated 3mo ago

Evaluation Results

Method	Links
SFT 2025.08		43.66
PSFTwarm-up 2025.08		43.38
PSFT 2025.08		43.33
SFT-KL 2025.08		42.19
Base 2025.08		40.53
PSFTwarm-up 2025.08		33.64
PSFT 2025.08		32.4
SFT 2025.08		32.17
SFT-KL 2025.08		26.75
Base 2025.08		25.87