Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on SuperGPQA

50.1Mean@1

Agentic Proposing

16.882425.506234.1342.7538Aug 25, 2025Sep 22, 2025Oct 20, 2025Nov 18, 2025Dec 16, 2025Jan 13, 2026Feb 11, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
50.1-
2026.02
45.7-
2026.02
45.2-
2026.02
44.7-
2026.02
44.4-
2026.02
44.4-
2026.02
43.6-
2026.02
43.6-
2026.02
43.6-
2026.02
42.8-
2026.02
42.7-
2026.02
42.3-
2026.02
42.1-
2026.02
41.1-
2026.02
41-
2026.02
39.9-
2026.02
39.3-
2026.02
38.8-
2026.02
38-
2026.02
35.1-
2026.02
33.1-
2026.02
32.5-
2026.02
32-
2025.08
29.02-
2026.02
28.5-
2025.08
28.37-
2025.08
28.1-
2025.08
27.69-
2025.08
27.59-
2025.08
25.85-
2025.08
25.26-
2025.08
21.95-
2025.08
19.81-
2025.08
18.16-