| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Knowledge Evaluation | MMedbench English subset (val) | Accuracy60.33 | 36 | |
| Multilingual Medical Question Answering | MMedBench (test) | Accuracy (Chinese)83.07 | 20 | |
| Medical Multi-choice QA | MMedBench (test) | Token Accuracy92.55 | 16 | |
| Medical Multi-choice Question Answering | MMedBench (test) | Token Perplexity (log)0.1494 | 16 | |
| Knowledge Boundary Expression | MMedBench (test) | F1 Score69.9 | 15 | |
| Medical Question Answering | MMedBench 1.0 (test) | Chinese Accuracy84.47 | 9 |