| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Question Answering | MedQA | Accuracy89.41 | 154 | |
| Medical Question Answering | MedQA | Accuracy93.88 | 153 | |
| Medical Question Answering | MedQA | Accuracy86.5 | 124 | |
| Question Answering | MedQA-USMLE (test) | Accuracy94.34 | 101 | |
| Question Answering | MedQA | Accuracy94.8 | 96 | |
| Question Answering | MedQA | Accuracy71.72 | 86 | |
| Question Answering | MedQA (test) | Accuracy89.55 | 67 | |
| Question Answering | MedQA | F1 Score (%)72 | 50 | |
| Medical Reasoning | MedQA | Accuracy92.8 | 47 | |
| Medical Question Answering | MedQA US | Accuracy90.42 | 43 | |
| Multiple Choice Question Answering | MedQA | Accuracy50.98 | 39 | |
| Question Answering | MedQA standard (test) | Accuracy94 | 32 | |
| Multi-Turn Medical Dialogue | MedQA | Accuracy68.69 | 32 | |
| Medical Reasoning | MedQA | Accuracy83.76 | 30 | |
| Speculative Decoding | MedQA | Match Rate (MAT)6.47 | 30 | |
| Medical Question Answering | MedQA | Decision-Useful Rate89.8 | 30 | |
| Question Answering | MedQA | Accuracy85.8 | 28 | |
| Question Answering | MedQA USMLE | Accuracy87.4 | 27 | |
| Multiple Choice Question Answering | MedQA 5 opts | Accuracy87 | 26 | |
| Medical Diagnosis | MedQA agent | Rounds9.11 | 25 | |
| medical decision-making | MedQA | Accuracy92.59 | 23 | |
| Information Retrieval | MedQA | nDCG@1077.9 | 23 | |
| Medical Question Answering | MedQA | Accuracy78.3 | 21 | |
| Question Answering | MedQA (dev) | Accuracy77.6 | 21 | |
| General and STEM reasoning | MedQA | Pass@186.8 | 20 |