| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Question Answering | MedQA | Accuracy91.7 | 109 | |
| Question Answering | MedQA-USMLE (test) | Accuracy94.34 | 101 | |
| Question Answering | MedQA | Accuracy94.8 | 70 | |
| Question Answering | MedQA (test) | Accuracy89.55 | 61 | |
| Multiple Choice Question Answering | MedQA 5 opts | Accuracy87 | 26 | |
| Medical Diagnosis | MedQA agent | Rounds9.11 | 25 | |
| Multiple Choice Question Answering | MedQA | Accuracy44.01 | 24 | |
| Question Answering | MedQA (dev) | Accuracy77.6 | 21 | |
| Medical Knowledge | MedQA | Accuracy92.8 | 20 | |
| Correctness Prediction | MedQA | Accuracy61.29 | 18 | |
| Question Answering | MedQA USMLE | Accuracy87.4 | 18 | |
| Text Anonymization | MedQA | Privacy Score24.6 | 16 | |
| Prompt Leakage Attack | MedQA | ASR (500)31.3 | 16 | |
| Clinical Question Answering | NEJM-MedQA | Accuracy86.7 | 14 | |
| Clinical Question Answering | MedQA | Accuracy91.4 | 14 | |
| Medical Question Answering | MedQA 5-options Full (test) | Accuracy83.6 | 14 | |
| Medical Question Answering | NEJM-MedQA | Base Deviation0.22 | 13 | |
| Medical Question Answering | MedQA (M-QA) | Base Accuracy Std Dev0.12 | 13 | |
| Medical Question Answering | MedQA (test) | SR3.26 | 12 | |
| Medical Reasoning | MedQA | Accuracy91.1 | 10 | |
| Medical Multiple-Choice Question Answering | MedQA Chinese United States Medical Licensing Examination (test) | Accuracy42.77 | 10 | |
| Uncertainty Quantification | MedQA (test) | AUROC0.635 | 9 | |
| Medical Question Answering | MedQA MCMLE | Accuracy83.95 | 8 | |
| Medical Question Answering | MedQA US (4-option) | Accuracy90.2 | 8 | |
| Machine Unlearning | MedQA QF=100 | Forget Accuracy91.33 | 7 |