Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 2024 (Accuracy, Length, and Length-Accuracy Metrics)

36.7Accuracy

Effi. Reasoning

Updated 3mo ago

Evaluation Results

Method	Links
Effi. Reasoning 2025.06		36.7	5,771	19.9
DAST 2025.06		36.7	5,400	21.4
kimi-k1.5 2025.06		33.3	5,159	20.3
Bingo-A 2025.06		33.3	2,943	26.7
Bingo-E 2025.06		33.3	2,943	26.7
Demystifying 2025.06		30	6,183	14.9
Vanilla PPO 2025.06		26.7	6,961	10.3
O1-Pruner 2025.06		26.7	5,958	13.9