Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on MATH (Accuracy, Response Tokens, Length Reduction)

46.74Accuracy

GRPO+FIRSTN

Updated 1mo ago

Evaluation Results

Method	Links
GRPO+FIRSTN 2026.05		46.74	371.49	8
GRPO 2026.05		46.07	403.94	-
CPPO 2026.05		45.28	359.8	10.9
Pair 2026.05		45.02	163.15	59.6
BPPO 2026.05		44.76	154.1	61.9