Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment

About

Dermatological diagnosis represents a complex multimodal challenge that requires integrating visual features with specialized clinical knowledge. While vision-language pretraining (VLP) has advanced medical AI, its effectiveness in dermatology is limited by text length constraints and the lack of structured texts. In this paper, we introduce MAKE, a Multi-Aspect Knowledge-Enhanced vision-language pretraining framework for zero-shot dermatological tasks. Recognizing that comprehensive dermatological descriptions require multiple knowledge aspects that exceed standard text constraints, our framework introduces: (1) a multi-aspect contrastive learning strategy that decomposes clinical narratives into knowledge-enhanced sub-texts through large language models, (2) a fine-grained alignment mechanism that connects subcaptions with diagnostically relevant image features, and (3) a diagnosis-guided weighting scheme that adaptively prioritizes different sub-captions based on clinical significance prior. Through pretraining on 403,563 dermatological image-text pairs collected from education resources, MAKE significantly outperforms state-of-the-art VLP models on eight datasets across zero-shot skin disease classification, concept annotation, and cross-modal retrieval tasks. Our code will be made publicly available at https: //github.com/SiyuanYan1/MAKE.

Siyuan Yan, Xieji Li, Ming Hu, Yiwen Jiang, Zhen Yu, Zongyuan Ge• 2025

Related benchmarks

TaskDatasetResultRank
Concept AnnotationSkinCon (test)
AUROC0.7873
11
Disease ClassificationDermNet (test)
Accuracy82.66
11
Disease ClassificationF17K (test)
Accuracy0.3242
11
Disease ClassificationSD-128 (test)
ACC39.14
11
Disease ClassificationSNU-134 (test)
Accuracy32.7
11
Image-to-Text RetrievalSkinCAP
Recall@100.2096
11
Text-to-Image RetrievalSkinCAP
Recall@1019.95
11
Disease ClassificationPAD (test)
Accuracy0.5953
11
Concept AnnotationDerm7pt (test)
AUROC68.64
11
Medical Image ClassificationPH2 Dermoscopy 2
W_F188.2
6
Showing 10 of 15 rows

Other info

Code

Follow for update