Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mathematical Reasoning on AIME 24 (Accuracy@4, GRPO Speedup)

18.3Accuracy (mean@4)

DAPO

2.76.7510.814.85May 8, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
18.30.63
2026.05
16.70.75
2026.05
14.21.43
2026.05
13.31
2026.05
13.32.38
2026.05
12.51.62
2026.05
11.71.19
2026.05
10.82.51
2026.05
10-
2026.05
9.20.95
2026.05
8.31.3
2026.05
7.50.81
2026.05
7.51
2026.05
7.50.82
2026.05
7.51.26
2026.05
6.71
2026.05
6.71.13
2026.05
6.72.04
2026.05
5.8-
2026.05
5.80.95
2026.05
3.3-