Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 2025 (Pass@1, TC)

30Pass@1

ARPO

Updated 1mo ago

Evaluation Results

Method	Links
ARPO 2026.06		30	1.03
EAPO 2026.06		26.6	0.93
GRPO 2026.06		23.3	0.83
Reinforce++ 2026.06		23.3	1.03
AEPO 2026.06		23.3	1.21
ToolStar 2026.06		20	1.4
Reinforce++ 2026.06		16.7	1.3
ARPO 2026.06		16.7	1.3
GRPO 2026.06		13.3	1.22
ToolStar 2026.06		13.3	1.11
EAPO 2026.06		13.3	0.89
Base 2026.06		10	-
TIR 2026.06		10	0.16
Base 2026.06		6.7	-
TIR 2026.06		6.7	0.2