CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
About
This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://cplip.github.io/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | DigestPath (test) | DSC68.7 | 29 | |
| Tile-level classification | PatchCamelyon | F156.7 | 24 | |
| Tile-level classification | BACH | Weighted F1 Score56.3 | 22 | |
| Tile-level classification | WSSS4LUAD | Weighted F1 Score88.2 | 16 | |
| Tile-level classification | NCT-CRC | Weighted F1 Score84.4 | 16 | |
| Tile-level classification | SICAP | Weighted Avg F10.511 | 16 | |
| Tile-level classification | DigestPath | Weighted F1 Score90.7 | 16 | |
| WSI-level classification | CAMELYON-16 | F1 (Weighted)63.2 | 16 | |
| Tile-level classification | Databiox | Weighted F1 Score0.487 | 16 | |
| Tile-level classification | SkinCancer | Weighted F1 Score47.6 | 16 |