Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on Olympiad (test)

52.1Accuracy

OpenAI-o1-preview

Updated 1mo ago

Evaluation Results

Method
OpenAI-o1-preview 2025.02	52.1	-
GPT-4o 2025.02	43.3	-
NuminaMath-72B 2025.02	36.7	-
AutoCode4Math-Qwen2.5 2025.02	32.6	-
Qwen-2.5-Base-7B 2025.02	30.37	-
AutoCode4Math-DeepSeek 2025.02	26.95	-
AutoCode4Math-Qwen2 2025.02	26.37	-
DeepSeekMath-CRPS-60K 2026.04	24.6	-
MathFusion 2026.04	23.3	-
DeepSeekMath-CRPS-30K 2026.04	23.2	-
Dart-Math-Llama3-8B 2025.02	23	-
SIGMA-60K 2026.04	22.5	-
NuminaMath-7B-CoT 2025.02	22.22	-
DeepSeekMath-DART 2026.04	21.7	-
Qwen2Math-Base-7B 2025.02	21.62	-
MathFusion (Sequential) 2026.04	21.6	-
SIGMA-30K 2026.04	21.6	-
Mathstral-7B 2025.02	21.5	-
DeepSeekMath-CRPS-15K 2026.04	21.4	-
DeepSeekMath-DART 2026.04	21	-
DeepseekMath-Instruct-7B 2025.02	20.44	-
DeepSeekMath-RFT 2026.04	19.1	-
Dart-Math-DeepSeek-7B 2025.02	18.52	-
DeepSeekMath-Instruct 2026.04	14.2	-
DeepSeekMath-MMIQC 2026.04	13	-
Mammoth-Mistral-7B 2025.02	9.63	-
DeepSeekMath-MetaMath 2026.04	9.5	-
Qwen3-4B-Base 2026.05	-	28.6
GRPO 2026.05	-	40.5
GSPO 2026.05	-	39.8
DAPO 2026.05	-	41.8
HölderPO 2026.05	-	40.6
GRPO 2025.10	-	42.8
GRPO + random credit 2025.10	-	42
GRPO + high-entropy credit 2025.10	-	42.5
GRPO + global-anchor credit 2025.10	-	43
GRPO + local-chunk credit 2025.10	-	43.1
GRPO + coupled rhythm credit 2025.10	-	44.1
GRPO 2025.10	-	44.2
GRPO + random credit 2025.10	-	43.3
GRPO + high-entropy credit 2025.10	-	45.6
GRPO + global-anchor credit 2025.10	-	46.1
GRPO + local-chunk credit 2025.10	-	45.9
GRPO + coupled rhythm credit 2025.10	-	47
GRPO 2025.10	-	49.9
GRPO + random credit 2025.10	-	50
GRPO + high-entropy credit 2025.10	-	48.6
GRPO + global-anchor credit 2025.10	-	51.2
GRPO + local-chunk credit 2025.10	-	51.5
GRPO + coupled rhythm credit 2025.10	-	52.2