Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AIME 2025 (Mean@64 accuracy)

51.2Mean@64 Acc

Agentic Proposing

Updated 1mo ago

Evaluation Results

Method	Links
Agentic Proposing 2026.02		51.2
PromptCot 2.0 2026.02		48.5
Deepmath 2026.02		48.2
Polaris 2026.02		47.4
PromptCoT 2026.02		47.3
MathSmith 2026.02		47.1
Socratic-zero 2026.02		46.9
Qwen3-4B-Instruct-2507 2026.02		46.7
OpenthoughtsS3 2026.02		46.1
OpenR1math 2026.02		45.8
GPT-5.2-High 2026.02		45.8
Claude4.5-Opus 2026.02		45.7
DeepSeek-V3.2-Spe 2026.02		45.6
R-zero 2026.02		44.3
Wizardmath 2026.02		44
NuminaMath 2026.02		43.9
Metamath 2026.02		43.5
Qwen3-Max 2026.02		43.4
Gemini-3-Pro 2026.02		43.1