| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Professional Medicine | Majority Vote | Accuracy74.6 | 56 | 25d ago | |
| HealthBench Hard | Accuracy40.74 | 41 | 4d ago | ||
| MedCaseReasoning (test) | Accuracy72.5 | 28 | 16d ago | ||
| MedDDx (test) | MedLA+LLaMA3.1(8B) | Basic Accuracy48.2 | 28 | 1mo ago | |
| MedDDx | MedLA+LLaMA3.1(8B) | Basic Accuracy48.2 | 22 | 1mo ago | |
| XMEMRs | RE-MCDF | Recall42.33 | 22 | 1mo ago | |
| NEEMRs | RE-MCDF | Recall46.13 | 22 | 1mo ago | |
| MedQA | ReConcile | Accuracy92.8 | 21 | 17d ago | |
| DDXPlus | MDAgents | Accuracy (DDXPlus)77.9 | 17 | 17d ago | |
| Medbullets, MedMCQA, MedQA | + WIST (Reward) | MedBullets Score54.22 | 15 | 24d ago | |
| RareDis Sub (test) | Llama-3.1-8B-Instruct + MedSSR | Symptoms Accuracy80 | 13 | 4d ago | |
| PubMedQA | TMA-AllCompon | Accuracy78.3 | 13 | 9d ago | |
| MedBullets | MDAgents | Accuracy80.8 | 13 | 17d ago | |
| DeepTumorVQA | Photon | Fatty Liver Assessment77.3 | 13 | 22d ago | |
| MedQA and MedMCQA mixture | FedAvg-PubSwap | Pass@159.4 | 12 | 3d ago | |
| Medical Reasoning | FedAvg-GRPO | pass@159.7 | 12 | 3d ago | |
| CMB clin | GraphWalker | BLEU-129.68 | 12 | 9d ago | |
| MedQA | GraphWalker | EM61.1 | 12 | 9d ago | |
| CMB | GraphWalker | Exact Match (EM)84.05 | 12 | 9d ago | |
| Medical-O1-Reasoning-SFT | LLM-AutoDP | Wins1 | 12 | 1mo ago | |
| Medical-O1-Reasoning-SFT (test) | LLM-AutoDP | Wins0.5127 | 12 | 1mo ago | |
| MedAgentsBench Hard Subsets | MEDCOG-META | MEDQA0.52 | 12 | 1mo ago | |
| PubMedQA | TMA-AllCompon | Token Cost (tokens/question)1,509 | 11 | 17d ago | |
| MMLU-Pro | Token Cost (tokens/question)1,350 | 11 | 17d ago | ||
| MedQA | Token Cost1,742 | 11 | 17d ago |