| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MedMCQA | ReConcile | Accuracy86 | 58 | 22h ago | |
| Professional Medicine | Majority Vote | Accuracy74.6 | 56 | 2mo ago | |
| MedQA | ReConcile | Accuracy92.8 | 47 | 6d ago | |
| HealthBench Hard | Accuracy40.74 | 41 | 1mo ago | ||
| Medicine MedQA M-Med | Search-o1 | MedQA Score75.2 | 40 | 1mo ago | |
| DDXPlus | DMoA | Accuracy (DDXPlus)83.37 | 36 | 15d ago | |
| HealthBench | COTCAgent | Accuracy70.41 | 36 | 14d ago | |
| DiSCQ | COTCAgent | Accuracy98.37 | 30 | 18d ago | |
| Time-MMD | COTCAgent | Accuracy85.33 | 30 | 18d ago | |
| MedQA | COTCAgent | Accuracy83.76 | 30 | 18d ago | |
| MedCaseReasoning (test) | Accuracy72.5 | 28 | 2mo ago | ||
| MedDDx (test) | MedLA+LLaMA3.1(8B) | Basic Accuracy48.2 | 28 | 3mo ago | |
| MedDDx | MedLA+LLaMA3.1(8B) | Basic Accuracy48.2 | 22 | 3mo ago | |
| XMEMRs | RE-MCDF | Recall42.33 | 22 | 3mo ago | |
| NEEMRs | RE-MCDF | Recall46.13 | 22 | 3mo ago | |
| Medicine MedBullets, MedXQA | LLaMA3.2-3B-Instruct (Full-shot) | Accuracy (MedBullets)84.87 | 18 | 1mo ago | |
| Medbullets, MedMCQA, MedQA | + WIST (Reward) | MedBullets Score54.22 | 15 | 2mo ago | |
| RareDis Sub (test) | Llama-3.1-8B-Instruct + MedSSR | Symptoms Accuracy80 | 13 | 1mo ago | |
| PubMedQA | TMA-AllCompon | Accuracy78.3 | 13 | 1mo ago | |
| MedBullets | MDAgents | Accuracy80.8 | 13 | 2mo ago | |
| DeepTumorVQA | Photon | Fatty Liver Assessment77.3 | 13 | 2mo ago | |
| MedQA | RecursiveMAS | Accuracy79.3 | 12 | 1mo ago | |
| MedQA and MedMCQA mixture | FedAvg-PubSwap | Pass@159.4 | 12 | 1mo ago | |
| Medical Reasoning | FedAvg-GRPO | pass@159.7 | 12 | 1mo ago | |
| CMB clin | GraphWalker | BLEU-129.68 | 12 | 1mo ago |