Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Problem Solving on AIME 25

93.3Accuracy

gpt-oss-20b-high

6.66829.15951.6574.141Jan 27, 2026Jan 29, 2026Jan 31, 2026Feb 3, 2026Feb 5, 2026Feb 7, 2026Feb 10, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
93.31.74--
2026.02
93.30.55--
2026.02
93.31.57--
2026.02
86.70.74--
2026.02
83.30.49--
2026.02
800.5--
2026.02
73.30.44--
2026.02
73.30.43--
2026.02
73.3-16.6-
2026.02
73.3---
2026.02
700.39--
2026.02
66.7-10-
2026.02
66.7-10-
2026.02
66.7---
2026.02
66.7---
2026.02
63.3-20-
2026.02
63.3-6.6-
2026.02
63.3-6.6-
2026.02
63.3---
2026.02
60-3.3-
2026.02
60---
2026.02
56.70.43--
2026.02
56.7---
2026.02
56.7---
2026.02
56.7---
2026.02
56.7---
2026.02
56.7---
2026.01
54.55---
2026.02
53.3-10-
2026.02
53.3---
2026.02
53.3---
2026.02
53.3---
2026.02
53.3---
2026.02
53.3---
2026.02
53.3---
2026.02
53.3---
2026.01
50---
2026.01
50---
2026.02
50-6.7-
2026.02
50-6.7-
2026.02
50---
2026.02
50---
2026.02
50---
2026.02
46.7-3.4-
2026.02
46.7---
2026.02
43.30.52--
2026.02
43.3---
2026.02
43.3-0-
2026.01
40.91---
2026.01
40.91---
2026.02
400.07--
2026.02
40---
2026.01
22.73---
2026.02
100.04--
2026.02
---24.36