Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling

About

Retinal foundation models aim to learn generalizable representations from diverse retinal images, facilitating label-efficient model adaptation across various ophthalmic tasks. Despite their success, current retinal foundation models are generally restricted to a single imaging modality, such as Color Fundus Photography (CFP) or Optical Coherence Tomography (OCT), limiting their versatility. Moreover, these models may struggle to fully leverage expert annotations and overlook the valuable domain knowledge essential for domain-specific representation learning. To overcome these limitations, we introduce UrFound, a retinal foundation model designed to learn universal representations from both multimodal retinal images and domain knowledge. UrFound is equipped with a modality-agnostic image encoder and accepts either CFP or OCT images as inputs. To integrate domain knowledge into representation learning, we encode expert annotation in text supervision and propose a knowledge-guided masked modeling strategy for model pre-training. It involves reconstructing randomly masked patches of retinal images while predicting masked text tokens conditioned on the corresponding retinal image. This approach aligns multimodal images and textual expert annotations within a unified latent space, facilitating generalizable and domain-specific representation learning. Experimental results demonstrate that UrFound exhibits strong generalization ability and data efficiency when adapting to various tasks in retinal image analysis. By training on ~180k retinal images, UrFound significantly outperforms the state-of-the-art retinal foundation model trained on up to 1.6 million unlabelled images across 8 public retinal datasets. Our code and data are available at https://github.com/yukkai/UrFound.

Kai Yu, Yang Zhou, Yang Bai, Zhi Da Soh, Xinxing Xu, Rick Siow Mong Goh, Ching-Yu Cheng, Yong Liu• 2024

Related benchmarks

TaskDatasetResultRank
OCT classificationSeven downstream OCT classification datasets (test)
BAcc78.6
8
Glaucoma ClassificationGF
AUROC0.958
7
Diabetic Retinopathy ClassificationMESSIDOR-2
AUROC0.882
7
Multicategory ClassificationRetina
AUROC90.1
7
Multicategory ClassificationJSIEC
AUROC99.5
7
Diabetic Retinopathy (DR) Predictionin-house dataset (test)
ROC AUC0.971
7
Age-Related Macular Degeneration (AMD) Predictionin-house dataset (test)
ROC AUC0.7
7
Coronary Artery Calcium (CAC) Predictionin-house dataset (test)
ROC AUC86
7
Estimated Glomerular Filtration Rate (eGFR) Predictionin-house dataset (test)
AUC ROC0.572
7
Glaucoma ClassificationPAPILA
AUROC78.3
7
Showing 10 of 19 rows

Other info

Follow for update