| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MedMCQA | ProtRLSearch | Accuracy90.4 | 521 | 1d ago | |
| MedQA | Spurious Universal | Accuracy89.41 | 154 | 1d ago | |
| MedQA | ToolTree | Accuracy93.88 | 153 | 2mo ago | |
| MedMCQA (test) | Accuracy84.13 | 134 | 3mo ago | ||
| MedQA | Med-PaLM 2 | Accuracy86.5 | 124 | 8d ago | |
| PubMedQA | HuatuoGPT-o1-70B | Accuracy81.4 | 117 | 15d ago | |
| MMLU Med | SEMA-RAG | Accuracy92.1 | 86 | 15d ago | |
| MedExpQA | Overall Accuracy86.19 | 70 | 3mo ago | ||
| PubMedQA | SafeLoRA | Accuracy82.8 | 65 | 20d ago | |
| MedBullets | Multi-Agent Medical Decision Consensus Matrix System | Accuracy84.2 | 65 | 1mo ago | |
| BioASQ | SEMA-RAG | Accuracy88.67 | 63 | 15d ago | |
| MedMCQA | Llama-3.1-8B-Instruct | BLEU Score10.82 | 54 | 2mo ago | |
| MedQA US | SEMA-RAG | Accuracy90.42 | 43 | 15d ago | |
| DDXPlus | Multi-Agent Medical Decision Consensus Matrix System | Accuracy86.5 | 43 | 2mo ago | |
| HealthBench Medicine N=5,000 (overall) | RVPO (k=0.5→2.0 / 1.0→2.0) | Rubric Score26.1 | 36 | 26d ago | |
| MedicalQA | Symphony-Coord | Accuracy86 | 33 | 3mo ago | |
| PubMedQA | MedXIAOHE | Pass@186 | 32 | 6d ago | |
| MedXpertQA | MA-RAG-ext | Accuracy22.2 | 31 | 2mo ago | |
| HeadQA | Zero-Shot CoT | Accuracy92.2 | 30 | 2mo ago | |
| Medec | BFRS | Accuracy69.2 | 30 | 2mo ago | |
| MedCalc-Bench | Zero-Shot CoT | Accuracy35.3 | 30 | 2mo ago | |
| MedQA | HCQR | Decision-Useful Rate89.8 | 30 | 2mo ago | |
| Polish Board Certification Examinations | Average Score69.2 | 30 | 2mo ago | ||
| MMLU-P | GPT-4.1-mini | Accuracy97.1 | 29 | 3mo ago | |
| PubMedQA | SPPFT | Factual Accuracy (FA)95.63 | 28 | 1mo ago |