Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on AIME-120

58.89Accuracy

BF16 Baseline

-2.355613.544729.44545.3453Jan 21, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
58.8970.63-
2026.01
58.89--
2026.01
41.11--
2026.01
36.6760.78-9.85
2026.01
34.17--
2026.01
32.7858.28-12.35
2026.01
25--
2026.01
22.78--
2026.01
21.6748.72-
2026.01
21.67--
2026.01
16.39--
2026.01
12.541.31-7.41
2026.01
11.1141.1-
2026.01
11.11--
2026.01
1038.39-10.33
2026.01
10--
2026.01
5.15--
2026.01
3.89--
2026.01
3.33--
2026.01
0.83--
2026.01
0.56--
2026.01
0.2817.34-23.76
2026.01
04.84-36.26
2026.01
021.44-19.66
2026.01
02.11-46.61
2026.01
0--
2026.01
0--
2026.01
0--
2026.01
0--
2026.01
0--
2026.01
0--
2026.01
0--
2026.01
0--
2026.01
0--
2026.01
0--