Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on MedQA standard (test)

94Accuracy

GPT-4.1

72.6878.21583.7589.285Nov 18, 2025
Updated 16d ago

Evaluation Results

MethodLinks
2025.11
9481.20.0560.073
2025.11
93.982.20.0520.041
2025.11
93.6---
2025.11
93.667.90.0590.053
2025.11
88.8750.0890.029
2025.11
88.380.70.0970.096
2025.11
88.279.60.0890.044
2025.11
88---
2025.11
8765.50.1120.072
2025.11
86.9---
2025.11
86.968.30.1230.121
2025.11
86.971.30.1270.127
2025.11
86.7680.1180.096
2025.11
86.667.90.1170.088
2025.11
82.965.60.130.043
2025.11
82.3---
2025.11
82.366.50.1760.176
2025.11
82.364.40.140.078
2025.11
81.4700.1610.148
2025.11
79.170.40.1510.029
2025.11
78.9---
2025.11
78.969.70.1940.196
2025.11
78.977.70.1470.083
2025.11
78.569.90.1550.052
2025.11
7866.60.1650.099
2025.11
7870.50.1580.047
2025.11
76.4---
2025.11
76.467.50.2230.219
2025.11
76.464.20.2150.199
2025.11
75.667.20.1730.044
2025.11
74.967.50.2070.181
2025.11
73.566.30.2380.224