Share your thoughts, 1 month free Claude Pro on usSee more

Mathematical Reasoning on AMO-Bench (Avg@5)

0.646Avg@5

GPT-5.1-high

Updated 1mo ago

Evaluation Results

Method	Links
GPT-5.1-high 2026.01		0.646
Gemini-3-Pro-Preview 2026.01		0.611
GPT-5.1-high 2026.01		0.588
GPT-5-high 2026.01		0.584
GPT-5.1-high 2026.01		0.56
Gemini-3-Pro-Preview 2026.01		0.56
Gemini-3-Pro-Preview 2026.01		0.56
GPT-5-high 2026.01		0.544
GLM-4.6 2026.01		0.529
DeepSeek-V3.1-thinking 2026.01		0.528
DeepSeek-V3.1-thinking 2026.01		0.521
DeepSeek-V3.2-thinking 2026.01		0.516
GLM-4.6 2026.01		0.48
Seed-1.6-1015-high 2026.01		0.48
DeepSeek-V3.1-thinking 2026.01		0.48
DeepSeek-V3.2-thinking 2026.01		0.46
Gemini-2.5-Pro 2026.01		0.435
Seed-1.6-1015-high 2026.01		0.431
Seed-1.6-Thinking-0715 2026.01		0.416
GLM-4.6 2026.01		0.4
Seed-1.6-Thinking-0715 2026.01		0.4
GPT-5-high 2026.01		0.4
Seed-1.6-Lite-1015-high 2026.01		0.397
qwen3-max-0923 2026.01		0.392
Seed-1.6-Thinking-0715 2026.01		0.38
Gemini-2.5-Pro 2026.01		0.376
Minimax-M2 2026.01		0.366
Seed-1.6-1015-high 2026.01		0.364
Seed-1.6-Lite-1015-high 2026.01		0.364
Claude-Sonnet-4.5-thinking 2026.01		0.36
Seed-1.6-Lite-1015-high 2026.01		0.36
qwen3-max-0923 2026.01		0.352
Claude-Sonnet-4.5-thinking 2026.01		0.324
Minimax-M2 2026.01		0.284
Gemini-2.5-Pro 2026.01		0.28
DeepSeek-V3.2-thinking 2026.01		0.28
Kimi-K2-thinking 2026.01		0.265
Kimi-K2-thinking 2026.01		0.22
GPT-5.1-chat-latest 2026.01		0.207
Minimax-M2 2026.01		0.2
Kimi-K2-0905 2026.01		0.199
Kimi-K2-0905 2026.01		0.188
Claude-Sonnet-4.5-thinking 2026.01		0.18
GPT-5.1-chat-latest 2026.01		0.176
Kimi-K2-thinking 2026.01		0.172
qwen3-max-0923 2026.01		0.14
Kimi-K2-0905 2026.01		0.08
GPT-5.1-chat-latest 2026.01		0.06