| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Question Answering | Out-of-domain medical QA History, Engineering, Law (test) | History Accuracy50.4 | 10 | |
| Medical Question Answering | Medical QA | GPT-4 Score92.5 | 9 | |
| Medical Question Answering | Medical QA offline evaluation | Honesty0.83 | 3 |