Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mathematical Reasoning on GSM8K (val)

90.8Accuracy

OBLR-PO

32.24847.44962.6577.851Feb 16, 2025Apr 16, 2025Jun 14, 2025Aug 12, 2025Oct 10, 2025Dec 8, 2025Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.11
90.8--
2025.11
90.4--
2025.11
89.3--
2025.11
89--
2025.11
88.6--
2025.11
88.3--
2025.11
88--
2025.11
87.7--
2025.11
87--
2025.11
86.7--
2026.02
77.78--
2026.02
75.11--
2026.02
74.83--
2026.02
74.58--
2026.02
73.31--
2026.02
72.76--
2026.02
72.66--
2026.02
72.09--
2026.02
69.91--
2026.02
69.5--
2025.02
62.93--
2025.02
62.32--
2025.02
60.88--
2025.02
60.58--
2025.02
60.35--
2025.02
60.12--
2025.02
60.05--
2025.02
59.97--
2025.02
59.29--
2025.02
59.21--
2025.02
58.98--
2025.02
58.91--
2025.02
58.38--
2025.02
57.85--
2025.02
57.16--
2025.02
56.48--
2025.02
55.88--
2026.02
54.79--
2025.02
53.53--
2025.02
53.53--
2026.02
53.22--
2025.02
53.15--
2025.02
52.16--
2025.02
51.78--
2025.02
51.78--
2025.02
51.71--
2025.02
51.1--
2025.02
50.57--
2025.02
50.27--
2025.02
49.51--
2025.02
49.43--
2026.02
49.13--
2026.02
47.28--
2025.02
47.08--
2025.02
46.7--
2025.02
46.55--
2025.02
46.4--
2025.02
45.03--
2025.02
44.73--
2025.02
43.44--
2026.02
42.85--
2025.02
39.95--
2025.02
39.2--
2025.02
37.53--
2025.02
35.63--
2025.02
35.1--
2025.02
34.5--
2023.11
-0.5350.601
2023.11
-0.5860.738
2023.11
-0.7790.87
2023.11
-0.194-
2023.11
-0.7280.815
2023.11
-0.7470.823
2023.11
-0.7770.866