Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Medical Question Answering on PubMedQA (Accuracy)

81.4Accuracy

HuatuoGPT-o1-70B

45.20854.6046473.396Apr 1, 2025Jun 8, 2025Aug 15, 2025Oct 23, 2025Dec 30, 2025Mar 8, 2026May 16, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2025.04
81.4
2025.04
80.1
2025.04
80
2025.07
80
2025.04
79.9
2025.04
79.3
2025.04
79.2
2025.07
79
2026.03
78.8
2025.04
78.6
2026.03
78.5
2025.07
78.4
2026.03
78.3
2025.04
78.1
2026.03
78
2026.03
77.8
2025.07
77.7
2025.04
77.6
2026.03
77.6
2025.04
77.5
2025.04
77.5
2025.07
77.2
2025.07
76.8
2025.07
76.4
2025.07
76.2
2025.04
76
2025.04
75.8
2025.07
75.8
2026.03
75.6
2025.04
75.5
2026.03
75.4
2026.03
74.8
2025.07
74.2
2025.07
74.1
2026.03
74
2025.07
73.4
2025.07
73.4
2026.03
72.9
2025.04
72.6
2026.03
72.5
2026.03
72
2025.04
71.3
2025.04
71.3
2026.03
71.2
2025.04
71.1
2025.04
70.8
2025.04
70.1
2026.03
69.5
2025.08
69.3
2025.04
68.9
2025.08
68.7
2025.07
68.4
2025.04
68
2025.08
66.6
2025.04
63.8
2025.04
63.4
2026.03
62
2026.03
59.4
2026.05
59.2
2026.05
59.2
2026.03
57.7
2026.02
56.8
2025.08
56.7
2025.08
56.1
2026.02
56
2026.05
56
2026.05
55.8
2026.02
55.2
2026.02
55.2
2026.02
55.2
2026.02
54.8
2026.05
54.6
2026.02
54.2
2026.03
54.1
2026.02
54
2026.02
54
2026.02
54
2026.05
53.6
2025.04
52.7
2026.05
52.6
2026.02
51.8
2026.02
51.8
2026.02
51.6
2026.05
51.2
2026.05
51.2
2026.05
50.6
2026.05
50.2
2026.02
50
2026.02
49.8
49.8
49.7
2026.05
49.2
2026.03
49
2025.08
48.7
2026.05
48.6
2026.05
48.4
2026.05
47.2
46.8
2026.05
46.8
2026.05
46.6
Showing 100 of 117 rows