Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning Robustness on Mathematical Reasoning Perturbation Experiments

76.2Robustness Perturbation Success Rate (R-PSR)

R1-Qwen-7B (Base)

27.63240.24152.8565.459Sep 29, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.09
76.25.9
2025.09
73.57.3
2025.09
35.623.1
2025.09
29.520.1