MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise
About
Current fundus image analysis models are predominantly built for specific tasks relying on individual datasets. The learning process is usually based on data-driven paradigm without prior knowledge, resulting in poor transferability and generalizability. To address this issue, we propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, we present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise, called KeepFIT. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge. Our proposed fundus foundation model achieves state-of-the-art performance across six unseen downstream tasks and holds excellent generalization ability in zero-shot and few-shot scenarios. MM-Retinal and KeepFIT are available at https://github.com/lxirich/MM-Retinal.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ODIR 200x3 | ACA89.3 | 60 | |
| Classification | RFMiD (test) | AUC81.52 | 18 | |
| Retinal Fundus Image Classification | FIVES | AUC90.62 | 18 | |
| Classification | RIM-ONE | AUC89.91 | 18 | |
| Classification | REFUGE (test) | AUC84.89 | 18 | |
| Diabetic Macular Edema detection | IDRiD DME | AUC69.03 | 18 | |
| Diabetic Retinopathy Detection | IDRiD DR | AUC77.13 | 18 | |
| Classification | OIA-DDR | AUC79.42 | 18 | |
| Classification | ADAM (test) | AUC0.7488 | 18 | |
| Medical Image Classification | PALM | AUC93.34 | 18 |