Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Understanding on MMLU (Subject Performance)

80.7Medicine Accuracy

GPT-4

22.9837.96552.9567.935Feb 19, 2024Jun 17, 2024Oct 15, 2024Feb 11, 2025Jun 11, 2025Oct 8, 2025Feb 5, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2024.02
80.77689.766.785------------
2024.02
70.13973.646.479------------
2024.02
69.33582.654.478------------
2024.02
67.64174.453.574------------
2024.02
67.63381.852.677------------
2024.02
65.93476.950.974------------
2024.02
65.92768.633.367------------
2024.02
60.74769.440.468------------
2024.02
59.53470.236.970------------
2024.02
43.631703065------------
2024.02
41.91457.828.158------------
2026.02
40.5----51.152----------
2026.02
39.7----5049----------
2026.02
39.3----49.650----------
2025.12
31.2--26.7---36.829.526.227.31828----
2025.12
30.5--30---35.532.131.529.21929.3----
2025.12
25.2--32.1---28.327.12824.11726----
2025.06
-------------37.551.541.649.7
2025.06
-------------38.550.841.249.1
2025.06
-------------38.550.841.949.3
2025.06
-------------38.853.542.250.7
2025.06
-------------68.482.563.276.4
2025.06
-------------68.581.56374.4
2025.06
-------------68.581.763.375.5
2025.06
-------------68.682.763.876.5