| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Question Answering | MedMCQA | Accuracy90.4 | 521 | |
| Medical Question Answering | MedMCQA (test) | Accuracy84.13 | 134 | |
| Question Answering | MedMCQA | Accuracy64.67 | 98 | |
| Medical Reasoning | MedMCQA | Accuracy86 | 58 | |
| Medical Question Answering | MedMCQA | BLEU Score10.82 | 54 | |
| Question Answering | MedMCQA | AUC75.95 | 51 | |
| Question Answering | MedMCQA (test) | Test Error Rate0.163 | 48 | |
| Hallucination Detection | MedMCQA | AUC75.57 | 42 | |
| Multiple-choice Question Answering | MedMCQA | Accuracy88.9 | 42 | |
| Data Selection | MedMCQA (fresh candidate pool) | Accuracy57.4 | 34 | |
| Medical | MedMCQA | Accuracy (ACC)58.2 | 33 | |
| Multi-Turn Medical Dialogue | MedMCQA | Accuracy63.31 | 32 | |
| Medical Question Answering | MedMCQA | Pass@1 Accuracy53.6 | 28 | |
| Medical Knowledge Editing | MedMCQA edit | Efficacy51 | 18 | |
| Question Answering | MedMCQA | R@172.33 | 15 | |
| Machine Unlearning | MedMCQA QF=1000 | Forget Accuracy90 | 14 | |
| LLM Routing | MEDMCQA (val) | Top-1 Acc96.3 | 14 | |
| LLM Routing | MedMCQA | Top-1 Acc96.3 | 14 | |
| Clinical Question Answering | MedMCQA | Accuracy86.1 | 14 | |
| Medical Question Answering | MedMCQA | Tau Correlation4.3 | 13 | |
| Medical information extraction and understanding | MedMCQA | Perplexity (PPL)3.28 | 12 | |
| Medical Reasoning | MedMCQA | Token Cost (tokens/question)1,047 | 11 | |
| Question Answering | MedMCQA (dev) | Accuracy0.791 | 11 | |
| Biomedical Question Answering | MedMCQA In-Domain (test) | Accuracy90 | 10 | |
| Question Answering | MedMCQA 1,000-example evaluation slice | Accuracy39.8 | 9 |