Share your thoughts, 1 month free Claude Pro on usSee more

Reasoning on Mathematical Reasoning (train)

1,177.8MTL (Loss)

SFT

Updated 4mo ago

Evaluation Results

Method	Links
SFT 2025.09		1,177.8	75.8
R1-Qwen-7B (Base) 2025.09		1,244	76.9
RL (GRPO) 2025.09		1,481.9	93.2
FARL 2025.09		1,660.8	92.4