Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on AMO-Bench (Avg@5)

0.646Avg@5

GPT-5.1-high

0.036560.194780.3530.51122Jan 23, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
0.646
2026.01
0.611
2026.01
0.588
2026.01
0.584
2026.01
0.56
2026.01
0.56
2026.01
0.56
2026.01
0.544
2026.01
0.529
2026.01
0.528
2026.01
0.521
2026.01
0.516
2026.01
0.48
2026.01
0.48
2026.01
0.48
2026.01
0.46
2026.01
0.435
2026.01
0.431
2026.01
0.416
2026.01
0.4
2026.01
0.4
2026.01
0.4
2026.01
0.397
2026.01
0.392
2026.01
0.38
2026.01
0.376
2026.01
0.366
2026.01
0.364
2026.01
0.364
2026.01
0.36
2026.01
0.36
2026.01
0.352
2026.01
0.324
2026.01
0.284
2026.01
0.28
2026.01
0.28
2026.01
0.265
2026.01
0.22
2026.01
0.207
2026.01
0.2
2026.01
0.199
2026.01
0.188
2026.01
0.18
2026.01
0.176
2026.01
0.172
2026.01
0.14
2026.01
0.08
2026.01
0.06