| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Generation | Merlin | BLEU-10.3898 | 14 | |
| Medical Report Generation | Merlin (test) | RaTE Score35.64 | 14 | |
| RadBERT classification | Merlin dataset | Micro Precision52.63 | 14 | |
| Text-to-Image Retrieval | Merlin Impressions section (test) | Recall@143.2 | 12 | |
| Text-to-Image Retrieval | Merlin Findings section (test) | Recall@177.6 | 12 | |
| Text-to-image retrieval | Merlin (val) | Recall@1 (Findings)70.7 | 9 | |
| Metric Correlation with Human Judgment | Merlin | Pearson Correlation0.369 | 7 | |
| Text-to-Image Retrieval | Merlin Full Report (test) | Recall@166.4 | 6 | |
| Medical finding classification | MERLIN (test) | AUROC0.83 | 6 | |
| Classification | Merlin | AUROC81 | 5 | |
| Abnormality diagnosis | Merlin (internal val) | F1 Score84.3 | 5 | |
| Report retrieval | Merlin | Recall@188.8 | 4 | |
| Disease Classification | Merlin (test) | Precision86.9 | 4 | |
| Abdominal Pathology Classification | MERLIN N = 5,082 (Internal) | AUC88 | 4 | |
| Medical Report Generation | Merlin | BLEU-29 | 3 | |
| Report-to-volume retrieval | Merlin n=5,082 | Recall@522.5 | 3 | |
| Quantitative Phenotyping | MERLIN | AP71.3 | 2 | |
| Medical Report Generation | Merlin neg | BLEU-260.5 | 1 | |
| Medical Report Generation | Merlin pos | BLEU-229.8 | 1 |