Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on MATH (test) (accuracy (r*), self-reward (r_self))

70.6Accuracy (r*)

Qwen / Base

39.81647.80855.863.792May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.05
70.60.093
2026.05
68.20.064
2026.05
67.70.08
2026.05
67.60.06
2026.05
66.10.073
2026.05
63.10.094
2026.05
57.90.185
2026.05
51.30.175
2026.05
49.30.155
2026.05
48.10.088
2026.05
47.60.173
2026.05
470.118
2026.05
46.10.086
2026.05
45.70.106
2026.05
44.90.234
2026.05
410.405