Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scientific Reasoning on GPQA Diamond (pass@1 and avg@10)

64.58Pass@1

Large Model

-2.2202415.1221332.464549.80687Jun 3, 2025Jul 14, 2025Aug 25, 2025Oct 6, 2025Nov 17, 2025Dec 29, 2025Feb 9, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.02
64.58-
2026.02
63.64-
2026.02
61.62-
2026.02
60.61-
2026.02
56.82-
2026.02
48.99-
2025.06
47.98-
2025.06
47.47-
2026.02
42.68-
2026.02
41.29-
2025.06
40.4-
2025.06
38.38-
2026.02
37.33-
35.86-
2026.02
30.3-
2026.02
0.632-
2026.02
0.621-
2026.02
0.611-
2026.02
0.6-
2026.02
0.566-
2026.02
0.561-
2026.02
0.559-
2026.02
0.553-
2026.02
0.505-
2026.02
0.5-
2026.02
0.465-
2026.02
0.444-
2026.02
0.429-
2026.02
0.424-
2026.02
0.417-
2026.02
0.367-
2026.02
0.349-