Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Medical Reasoning on MedAgentsBench Hard Subsets

0.52MEDQA

MEDCOG-META

0.27040.33520.40.4648Feb 8, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.520.360.3560.440.20.3750.443
2026.02
0.50.320.2880.360.190.3320.182
2026.02
0.480.310.3840.370.180.3450.124
2026.02
0.450.250.370.420.150.3280.153
2026.02
0.430.30.2880.080.150.25-0.036
2026.02
0.410.340.3420.340.130.3120.331
2026.02
0.390.30.260.350.10.28-
2026.02
0.370.350.3010.430.060.3020.104
2026.02
0.360.220.2470.080.110.203-0.169
2026.02
0.340.260.260.220.110.238-3
2026.02
0.320.250.2470.210.090.223-
2026.02
0.280.290.2740.090.20.227-