| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Medical Question Answering | DDXPlus | Accuracy86.5 | 43 | |
| Medical Reasoning | DDXPlus | Accuracy (DDXPlus)77.9 | 17 | |
| Medical Reasoning | DDXPlus | Token Cost2,383 | 11 | |
| Medical Reasoning | DDXPlus | Performance Score90.2 | 11 | |
| Automated Medical Diagnosis | DDXPlus (test) | IL25.75 | 9 | |
| Medical Differential Diagnoses | DDXPlus | Avg Correct79 | 8 | |
| Privacy Rewriting | DDXPlus Pri | Accuracy87.6 | 7 | |
| Confidence Estimation | DDXPlus | AUROC0.795 | 7 | |
| Calibration | DDXPlus | Top-1 ECE0.01 | 4 | |
| Classification | DDXPlus | Accuracy50.1 | 4 | |
| Synthetic Data Utility | DDXPlus | Overall Score99.9 | 3 | |
| Privacy Evaluation | DDXPlus | Overall Score100 | 3 | |
| Synthetic Data Detection | DDXPlus | Overall Score37.7 | 3 | |
| Tabular Data Synthesis | DDXPlus | Overall Score97.2 | 3 |