Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 2024 (Mean@64 accuracy)

53.6Mean@64 Accuracy

Agentic Proposing

Updated 3mo ago

Evaluation Results

Method	Links
Agentic Proposing 2026.02		53.6
Polaris 2026.02		51.7
Deepmath 2026.02		51.2
PromptCot 2.0 2026.02		50.9
MathSmith 2026.02		50.3
Socratic-zero 2026.02		50.2
PromptCoT 2026.02		50.1
Qwen3-4B-Instruct-2507 2026.02		49.8
OpenthoughtsS3 2026.02		49.6
DeepSeek-V3.2-Spe 2026.02		49.5
OpenR1math 2026.02		49.4
Claude4.5-Opus 2026.02		48.3
GPT-5.2-High 2026.02		47.7
NuminaMath 2026.02		47.3
Qwen3-Max 2026.02		46.2
R-zero 2026.02		45.8
Gemini-3-Pro 2026.02		45.5
Wizardmath 2026.02		44.8
Metamath 2026.02		44.6