Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning Robustness on Mathematical Reasoning Perturbation Experiments
Loading...
76.2
Robustness Perturbation Success Rate (R-PSR)
R1-Qwen-7B (Base)
27.632
40.241
52.85
65.459
Sep 29, 2025
Robustness Perturbation Success Rate (R-PSR)
Total Perturbation Success Rate (T-PSR)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Robustness Perturbation Success Rate (R-PSR)
Total Perturbation Success Rate (T-PSR)
R1-Qwen-7B (Base)
Training Time=/
2025.09
76.2
5.9
SFT
Training Time=9m 36s
2025.09
73.5
7.3
RL (GRPO)
Training Time=4h 09m 35s
2025.09
35.6
23.1
FARL
Training Time=4h 22m 53s
2025.09
29.5
20.1
Feedback
Search any
task
Search any
task