| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Question Answering | MedMCQA | Accuracy90.4 | 346 | |
| Medical Question Answering | MedMCQA (test) | Accuracy84.13 | 134 | |
| Medical Question Answering | MedMCQA | BLEU Score10.82 | 54 | |
| Question Answering | MedMCQA (test) | Test Error Rate0.163 | 48 | |
| Multi-Turn Medical Dialogue | MedMCQA | Accuracy63.31 | 32 | |
| Medical | MedMCQA | Accuracy (ACC)58.2 | 21 | |
| Medical Knowledge Editing | MedMCQA edit | Efficacy51 | 18 | |
| Machine Unlearning | MedMCQA QF=1000 | Forget Accuracy90 | 14 | |
| LLM Routing | MEDMCQA (val) | Top-1 Acc96.3 | 14 | |
| LLM Routing | MedMCQA | Top-1 Acc96.3 | 14 | |
| Clinical Question Answering | MedMCQA | Accuracy86.1 | 14 | |
| Medical Question Answering | MedMCQA | Tau Correlation4.3 | 13 | |
| Multiple-choice Question Answering | MedMCQA | Accuracy40.97 | 12 | |
| Medical Reasoning | MedMCQA | Token Cost (tokens/question)1,047 | 11 | |
| Medical Reasoning | MedMCQA | Accuracy86 | 11 | |
| Question Answering | MedMCQA (dev) | Accuracy0.791 | 11 | |
| Medical Question Answering | MedMCQA | Pass@1 Accuracy53.6 | 10 | |
| Biomedical Question Answering | MedMCQA In-Domain (test) | Accuracy90 | 10 | |
| Question Answering | MedMCQA | FDR (%)6.43 | 9 | |
| Medical Question Answering | MedMCQA translated (test) | Accuracy (ZH)43.2 | 9 | |
| Question Answering | MedMCQA | Accuracy64.3 | 8 | |
| Medical Reasoning | MedMCQA OOD (out-of-distribution) | Accuracy66.2 | 7 | |
| Out-of-Distribution Detection | MedMCQA Far-Domain | AUROC84.4 | 7 | |
| Missingness Bias Reduction | MedMCQA | KL Divergence1.13 | 7 | |
| Question Answering | MedMCQA (val) | Accuracy90 | 7 |