Boosting Medical Visual Understanding From Multi-Granular Language Learning
About
Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has played a pivotal role in multimodal learning. However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple high-level labels (e.g., disease categories) across different annotation granularities (e.g., diagnostic description, clinical explanation). To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. MGLL leverages structured multi-label supervision, integrates textual descriptions across granularities, and introduces soft-label supervision with point-wise constraints to enhance alignment. MGLL employs smooth Kullback-Leibler (KL) divergence to ensure cross-granularity consistency while maintaining computational efficiency as a plug-and-play module for vision-language models. Pretrained on our constructed large-scale multi-granular datasets and evaluated across multiple datasets, MGLL outperforms other state-of-the-art methods in downstream tasks. The code is available at https://github.com/HUANGLIZI/MGLL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | Covidx | Accuracy39 | 57 | |
| Medical Image Segmentation | COVID-Xray | Dice81.69 | 23 | |
| Classification | OIA-DDR | AUC88.85 | 18 | |
| Classification | RIM-ONE | AUC97.05 | 18 | |
| Classification | RFMiD (test) | AUC92.83 | 18 | |
| Classification | ADAM (test) | AUC0.963 | 18 | |
| Classification | REFUGE (test) | AUC93.9 | 18 | |
| Diabetic Macular Edema detection | IDRiD DME | AUC86.17 | 18 | |
| Diabetic Retinopathy Detection | IDRiD DR | AUC82.57 | 18 | |
| Medical Image Classification | PALM | AUC99.72 | 18 |