Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Reasoning on MedBullets
Loading...
80.8
Accuracy
MDAgents
20.376
36.063
51.75
67.437
Aug 11, 2025
Accuracy
Updated 17d ago
Evaluation Results
Method
Method
Links
Accuracy
MDAgents
Base Model=GPT-4o
2025.08
80.8
TMA-AllCompon
Base Model=GPT-4o
2025.08
78.8
ReConcile
Base Model=GPT-4o
2025.08
75.2
MedAgents
Base Model=GPT-4o
2025.08
70
DyLAN
Base Model=GPT-4o
2025.08
69.5
TMA-AllCompon
Base Model=MedGemma-4B
2025.08
48.2
TMA-AllCompon
Base Model=Gemma-3-4B
2025.08
46.7
DyLan
Base Model=Gemma-3-4B
2025.08
35.3
Single-Agent Best
Base Model=Gemma-3-4B
2025.08
34.6
TMA-AllCompon
Base Model=Gemma-3-4B
2025.08
33.9
MedAgents
Base Model=Gemma-3-4B
2025.08
33.3
ReConcile
Base Model=Gemma-3-4B
2025.08
29.3
MDAgents
Base Model=Gemma-3-4B
2025.08
22.7
Feedback
Search any
task
Search any
task