Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MERLIN

Benchmarks

Task NameDataset NameSOTA ResultTrend
Natural Language GenerationMerlin
BLEU-10.3898
14
Medical Report GenerationMerlin (test)
RaTE Score35.64
14
RadBERT classificationMerlin dataset
Micro Precision52.63
14
Text-to-Image RetrievalMerlin Impressions section (test)
Recall@143.2
12
Text-to-Image RetrievalMerlin Findings section (test)
Recall@177.6
12
Text-to-image retrievalMerlin (val)
Recall@1 (Findings)70.7
9
Metric Correlation with Human JudgmentMerlin
Pearson Correlation0.369
7
Text-to-Image RetrievalMerlin Full Report (test)
Recall@166.4
6
Medical finding classificationMERLIN (test)
AUROC0.83
6
ClassificationMerlin
AUROC81
5
Abnormality diagnosisMerlin (internal val)
F1 Score84.3
5
Report retrievalMerlin
Recall@188.8
4
Disease ClassificationMerlin (test)
Precision86.9
4
Abdominal Pathology ClassificationMERLIN N = 5,082 (Internal)
AUC88
4
Medical Report GenerationMerlin
BLEU-29
3
Report-to-volume retrievalMerlin n=5,082
Recall@522.5
3
Quantitative PhenotypingMERLIN
AP71.3
2
Medical Report GenerationMerlin neg
BLEU-260.5
1
Medical Report GenerationMerlin pos
BLEU-229.8
1
Showing 19 of 19 rows