Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on MMLU-Pro standard (test)

80.8Accuracy

GPT-4.1

64.1668.4872.877.12Nov 18, 2025
Updated 16d ago

Evaluation Results

MethodLinks
2025.11
80.8---
2025.11
80.671.50.1720.172
2025.11
80.281.20.1240.045
2025.11
79.7---
2025.11
79.773.40.1910.19
2025.11
79.768.40.1910.189
2025.11
79.679.80.1410.092
2025.11
79.472.40.1790.162
2025.11
7975.50.1720.152
2025.11
78.876.90.1620.12
2025.11
78.881.70.1350.059
2025.11
78.677.70.140.059
2025.11
77.177.60.1510.074
2025.11
77---
2025.11
72---
2025.11
7274.40.2610.259
2025.11
7253.50.3340.323
2025.11
71.677.20.2360.23
2025.11
71.176.40.250.243
2025.11
70.9---
2025.11
70.9560.2870.288
2025.11
70.9520.2480.151
2025.11
70.777.30.1740.091
2025.11
70.173.90.2030.147
2025.11
70720.210.156
2025.11
69.873.40.1850.07
2025.11
66.6---
2025.11
66.665.80.2990.294
2025.11
66.673.50.1960.092
2025.11
65.972.30.2090.129
2025.11
65.772.10.2290.18
2025.11
64.8700.2630.242