Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on SciBench

44.3Accuracy

GPT-4o

26.318430.986735.65540.3233Apr 2, 2026Apr 10, 2026Apr 19, 2026Apr 28, 2026May 7, 2026May 16, 2026May 25, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.04
44.3-
2026.04
42.6-
2026.05
41.2-
2026.05
40.4-
2026.05
38.7-
2026.05
38.4-
2026.05
38.2-
2026.05
37.92-
2026.05
37.8-
37.6-
2026.05
37.4-
2026.05
37.3-
2026.05
37-
2026.05
36.83-
2026.05
36.5-
2026.05
36.05-
2026.05
35.81-
2026.04
35.8-
2026.05
35.71-
2026.05
35.24-
35.1-
2026.05
34.92-
2026.05
34.91-
2026.05
34.86-
2026.05
34.52-
2026.05
34.32-
2026.05
34.18-
2026.05
33.2-
2026.05
33.04-
2026.05
32.4-
28.7-
2026.05
28.46-
2026.05
27.01-
2024.01
-28.52
2024.01
-12.17
2024.01
-0.4
2024.01
-1.54
2024.01
-1.2
2024.01
-2.4
2024.01
-2.4
2024.01
-3.77
2024.01
-1.03
2024.01
-3.6
2024.01
-3.6
2024.01
-4.63
2024.01
-6.17
2024.01
-6.23
2024.01
-1.37
2024.01
-4.29
2024.01
-5.15