Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scientific Reasoning on MMLU-STEM

87.4Accuracy

GPT-4o

4.30425.87747.4569.023Oct 31, 2025Nov 25, 2025Dec 21, 2025Jan 15, 2026Feb 10, 2026Mar 7, 2026Apr 2, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2026.04
87.4-
2026.04
80.7-
2026.04
79.2-
2025.10
75.5-
2025.10
72.7-
2025.12
72.363.5
2025.12
65.155.6
2025.12
64.265.2
2025.12
62.457.8
2025.12
61.670.5
2025.12
60.169.1
2025.12
5968.7
2025.12
58.268.2
2025.12
58.160.3
2025.12
56.958.1
2025.12
56.657.2
2025.12
56.551.9
2025.12
56.257.9
2025.10
56-
2025.10
54.7-
2025.10
48.7-
2025.10
47-
2025.12
45.446.1
2025.12
40.343.3
2025.10
15-
2025.10
12.1-
2025.10
7.5-