Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on SciBench

28.52Score

GPT-4

-0.72486.867614.4622.0524Jan 15, 2024
Updated 16d ago

Evaluation Results

MethodLinks
2024.01
28.52-
2024.01
12.17-
2024.01
6.23-
2024.01
6.17-
2024.01
5.15-
2024.01
4.63-
2024.01
4.29-
2024.01
3.77-
2024.01
3.6-
2024.01
3.6-
2024.01
2.4-
2024.01
2.4-
2024.01
1.54-
2024.01
1.37-
2024.01
1.2-
2024.01
1.03-
2024.01
0.4-
2026.04
-35.8
2026.04
-42.6
2026.04
-44.3