Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on GSM8K (Accuracy, Mean Response Tokens, Length Reduction)

76.96Accuracy

GRPO

Updated 1mo ago

Evaluation Results

Method	Links
GRPO 2026.05		76.96	265.08	-
GRPO+FIRSTN 2026.05		76.85	237.77	10.3
Pair 2026.05		74.44	144.77	45.4
BPPO 2026.05		73.84	122.39	53.8
CPPO 2026.05		73.64	232.62	12.2