Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on MedXpertQA standard (test)

41.7Accuracy

GPT-4.1

14.97221.91128.8535.789Nov 18, 2025
Updated 16d ago

Evaluation Results

MethodLinks
2025.11
41.70.5510.5590.564
2025.11
40.40.5830.3990.402
2025.11
400.6560.3090.29
2025.11
39.7---
2025.11
31.30.5480.5420.573
2025.11
30.90.550.4340.467
2025.11
30.7---
2025.11
29.70.5740.3020.275
2025.11
25.80.5650.6720.689
2025.11
25.20.5250.6580.683
2025.11
25.10.5820.3390.359
2025.11
24.7---
2025.11
24.70.5080.7340.732
2025.11
24.70.510.7340.736
2025.11
21.10.5160.6080.66
2025.11
20.7---
2025.11
20.70.4810.7880.789
2025.11
20.70.4590.570.617
2025.11
20.70.5390.3030.345
2025.11
20---
2025.11
200.5260.7530.763
2025.11
200.5390.520.576
2025.11
200.5490.390.444
2025.11
18.70.5090.5260.607
2025.11
18.60.5560.5140.6
2025.11
18.10.5360.640.695
2025.11
17.6---
2025.11
17.60.5050.810.808
2025.11
17.60.4970.5340.572
2025.11
17.20.5640.3950.472
2025.11
16.90.5180.7330.768
2025.11
160.5480.7890.808