Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on Olympiad Bench (Accuracy, Generation Length)

69.3Accuracy

Critique-GRPO

9.312824.886440.4656.0336Jun 3, 2025Jul 15, 2025Aug 27, 2025Oct 8, 2025Nov 20, 2025Jan 1, 2026Feb 13, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2025.06
69.3-
2025.06
66.4-
2025.06
62.7-
2026.01
41.66,164
2026.01
40.92,398
2026.01
402,068
2026.01
39.92,915
2026.01
39.43,452
2026.01
361,659
2026.01
361,714
2026.01
30.47,389
2026.01
29.2898
2026.01
27987
2026.01
22.8608
2026.02
14.4414,326
2026.02
13.6314,791
2026.02
13.6313,956
2026.02
13.2515,527
2026.02
12.8114,410
2026.02
12.6214,547
2026.02
12.6214,189
2026.02
12.5615,533
2026.02
11.6212,908