Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on Mathematical Reasoning (train)
Loading...
1,177.8
MTL (Loss)
SFT
1,158.48
1,288.89
1,419.3
1,549.71
Sep 29, 2025
MTL (Loss)
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
MTL (Loss)
Accuracy
SFT
Training Time=9m 36s
2025.09
1,177.8
75.8
R1-Qwen-7B (Base)
Training Time=/
2025.09
1,244
76.9
RL (GRPO)
Training Time=4h 09m 35s
2025.09
1,481.9
93.2
FARL
Training Time=4h 22m 53s
2025.09
1,660.8
92.4
Feedback
Search any
task
Search any
task