Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scientific Reasoning on GPQA (Accuracy & Generation Length)

45.5Accuracy

Full-CoT

1.40412.85224.335.748May 27, 2025Jul 7, 2025Aug 17, 2025Sep 28, 2025Nov 8, 2025Dec 19, 2025Jan 30, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
45.5179.3
2026.01
45.5115.7
2025.05
45-
2025.05
43.94-
2026.01
43.6130.8
2026.01
43.68
2026.01
43.68
2026.01
42.86
2025.05
42.57-
2026.01
41.88
2026.01
41.88
2026.01
40229.5
2026.01
39.60
2026.01
38.189.3
2026.01
37.130.8
2026.01
36.48
2026.01
36.48
2026.01
34.59,492
2026.01
34.5150.2
2026.01
34.58
2026.01
32.78
2026.01
31.46
2026.01
30.70
2026.01
28.90
2026.01
28.83,670
2026.01
28.66
2026.01
28.31,708
2026.01
27.83,815
2026.01
27.728.7
2026.01
27.35,589
2026.01
27.38
2026.01
27.16
2026.01
26.82,129
2026.01
266
2026.01
24.22,188
2026.01
20.71,044
2026.01
20.433.9
2026.01
203,655
2026.01
19.7743
2026.01
19.24,569
2025.05
13.87-
2025.05
13.64-
2025.05
11.11-
2025.12
5.25-
2025.12
4.35-
2025.12
4.35-
2025.12
3.99-
2025.12
3.62-
2025.12
3.44-
2025.12
3.1-