| MedMCQA | | Accuracy89.02 | | 253 | 3d ago |
| MedMCQA (test) | | Accuracy84.13 | | 134 | 3d ago |
| MedQA | Multi-Agent Medical Decision Consensus Matrix System | Accuracy91.7 | | 109 | 3d ago |
| MedExpQA | | Accuracy (English)88.48 | | 61 | 3d ago |
| PubMedQA | HuatuoGPT-o1-70B | Accuracy81.4 | | 45 | 3d ago |
| MedicalQA | Symphony-Coord | Accuracy86 | | 33 | 3d ago |
| DDXPlus | Multi-Agent Medical Decision Consensus Matrix System | Accuracy86.5 | | 28 | 3d ago |
| CV-MedExQA (test) | AU-probe | AUROC0.9987 | | 28 | 3d ago |
| CV-MedMCQA (test) | AU-probe | AUROC0.9999 | | 28 | 3d ago |
| CV-MedQA (test) | AU-probe | AUROC0.9998 | | 28 | 3d ago |
| Medical QA Evaluation Suite (MedQA, MedMCQA, MMLU-Med, PubMedQA, BioASQ, SEER, DDXPlus, MIMIC-IV) | SPO Planning | MedQA Score77.45 | | 27 | 3d ago |
| MedConceptsQA | | Accuracy94.27 | | 26 | 3d ago |
| MVME (test) | GPT4o | ETS8.46 | | 23 | 3d ago |
| MedXpertQA (test) | GPT4o | ETS Score8.49 | | 23 | 3d ago |
| MMLU Med | ReFilter | Accuracy82.92 | | 20 | 3d ago |
| BioASQ | ReFilter | Accuracy80.74 | | 20 | 3d ago |
| Medical QA Benchmarks (MedQA, MedMCQA, MMLU*, CMB, CMExam, CMMLU*) (test) | KaFT | MedQA Accuracy64.1 | | 20 | 3d ago |
| Medical TF-QA (test) | PubMedBERT + MA-X | Accuracy85 | | 18 | 3d ago |
| Medical Benchmarks (MedQA, MedMCQA, BULLET) (test) | C3oT | MedQA Accuracy0.5533 | | 18 | 3d ago |
| Medical Text QA Suite (MMLU-Med, PubMedQA, MedMCQA, MedQA, Medbullets, MedXpertQA, SGPQA) | | MMLU-Med91.3 | | 17 | 3d ago |
| HealthBench Overall | Baichuan-M2-32B | Overall Score60.1 | | 16 | 3d ago |
| HealthBench Hard | Baichuan-M2-32B | Score34.7 | | 16 | 3d ago |
| MedXpert (test) | Deepseek-v3.2-685B | Accuracy43.84 | | 15 | 3d ago |
| CMExam (test) | Deepseek-v3.2-685B | Accuracy91.84 | | 14 | 3d ago |
| MedQA 5-options Full (test) | MDAgents | Accuracy83.6 | | 14 | 3d ago |