Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Reasoning on PubMedQA
Loading...
78.3
Accuracy
TMA-AllCompon
17.564
33.332
49.1
64.868
Aug 11, 2025
Sep 9, 2025
Oct 9, 2025
Nov 8, 2025
Dec 7, 2025
Jan 6, 2026
Feb 5, 2026
Accuracy
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
TMA-AllCompon
Base Model=GPT-4o
2025.08
78.3
MedAgents
Base Model=GPT-4o
2025.08
76.4
MDAgents
Base Model=GPT-4o
2025.08
75
TMA-AllCompon
Base Model=MedGemma-4B
2025.08
73.4
DyLAN
Base Model=GPT-4o
2025.08
72.8
ReConcile
Base Model=GPT-4o
2025.08
70.8
TMA-AllCompon
Base Model=Gemma-3-4B
2025.08
59
MedRoute
2026.02
38.6
MAM
2026.02
37.3
GPT-4.1-mini
2026.02
34.5
Medichat-Llama3-8B
2026.02
32.81
Qwen3-8B
2026.02
20.65
MedAlpaca-7B
2026.02
19.9
Feedback
Search any
task
Search any
task