| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Question Answering | MedQA | Accuracy93.88 | 153 | |
| Question Answering | MedQA-USMLE (test) | Accuracy94.34 | 101 | |
| Question Answering | MedQA | Accuracy94.8 | 96 | |
| Question Answering | MedQA (test) | Accuracy89.55 | 61 | |
| Medical Question Answering | MedQA | Accuracy86.44 | 40 | |
| Multiple Choice Question Answering | MedQA | Accuracy50.98 | 39 | |
| Question Answering | MedQA standard (test) | Accuracy94 | 32 | |
| Multi-Turn Medical Dialogue | MedQA | Accuracy68.69 | 32 | |
| Medical Question Answering | MedQA | Decision-Useful Rate89.8 | 30 | |
| Question Answering | MedQA | Accuracy85.8 | 28 | |
| Multiple Choice Question Answering | MedQA 5 opts | Accuracy87 | 26 | |
| Medical Diagnosis | MedQA agent | Rounds9.11 | 25 | |
| Information Retrieval | MedQA | nDCG@1077.9 | 23 | |
| Question Answering | MedQA USMLE | Accuracy87.4 | 23 | |
| Medical Reasoning | MedQA | Accuracy92.8 | 21 | |
| Question Answering | MedQA (dev) | Accuracy77.6 | 21 | |
| Medical Question Answering | MedQA 4-option original (test) | Accuracy95.1 | 20 | |
| Medical Knowledge | MedQA | Accuracy92.8 | 20 | |
| Medical Question Answering | MedQA US | Accuracy53.1 | 18 | |
| Medical Question Answering | MedQA US (test) | Accuracy90.02 | 18 | |
| Correctness Prediction | MedQA | Accuracy61.29 | 18 | |
| Question Answering | MedQA | HS42.75 | 17 | |
| Text Anonymization | MedQA | Privacy Score24.6 | 16 | |
| Prompt Leakage Attack | MedQA | ASR (500)31.3 | 16 | |
| Medical Question Answering | MedQA USMLE-style | Accuracy73.02 | 15 |