Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Reasoning Accuracy on DDXPlus
Loading...
77.9
Accuracy (DDXPlus)
MDAgents
48.156
55.878
63.6
71.322
Aug 11, 2025
Accuracy (DDXPlus)
Updated 17d ago
Evaluation Results
Method
Method
Links
Accuracy (DDXPlus)
MDAgents
Base Model=GPT-4o
2025.08
77.9
TMA-AllCompon
Base Model=GPT-4o
2025.08
74.9
Single-Agent Best
Base Model=Gemma-3-4B
2025.08
72.1
TMA-AllCompon
Base Model=MedGemma-4B
2025.08
70.2
MedAgents
Base Model=Gemma-3-4B
2025.08
69.1
ReConcile
Base Model=GPT-4o
2025.08
68
TMA-AllCompon
Base Model=Gemma-3-4B
2025.08
67.3
MedAgents
Base Model=GPT-4o
2025.08
66.6
DyLAN
Base Model=Gemma-3-4B
2025.08
65.6
TMA-AllCompon
Base Model=Gemma-3-4B
2025.08
65.3
MedAgents
Base Model=Gemma-3-4B
2025.08
61
DyLan
Base Model=Gemma-3-4B
2025.08
60
ReConcile
Base Model=Gemma-3-4B
2025.08
58
DyLAN
Base Model=GPT-4o
2025.08
56
ReConcile
Base Model=Gemma-3-4B
2025.08
54
MDAgents
Base Model=Gemma-3-4B
2025.08
52.6
MDAgents
Base Model=Gemma-3-4B
2025.08
49.3
Feedback
Search any
task
Search any
task