Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Accuracy and F1 on MedQA (Medical Reasoning)

83.76Accuracy

COTCAgent

68.794472.679776.56580.4503May 14, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2026.05
83.7682.9
2026.05
82.8481.7
2026.05
82.3380.6
2026.05
81.7680.2
2026.05
81.6679.7
2026.05
81.4880.1
2026.05
81.1479.9
2026.05
80.7678.8
2026.05
79.3877.7
2026.05
79.2877.9
2026.05
78.9677.5
2026.05
78.8478.1
2026.05
78.7676.8
2026.05
78.5177.4
2026.05
78.2976.6
2026.05
78.1676.9
2026.05
77.7376.1
2026.05
77.5975.8
2026.05
77.5175.8
2026.05
77.3675.6
2026.05
77.0874.8
2026.05
76.0474.7
2026.05
75.9874.6
2026.05
75.7173.6
2026.05
74.0671.9
2026.05
72.1470.8
2026.05
71.5370.3
2026.05
71.2869.7
2026.05
70.6269.1
2026.05
69.3768.1