A multi-modal vision-language model for generalizable annotation-free pathology localization

About

Existing deep learning models for defining pathology from clinical imaging data rely on expert annotations and lack generalization capabilities in open clinical environments. Here, we present a generalizable vision-language model for Annotation-Free pathology Localization (AFLoc). The core strength of AFLoc is extensive multi-level semantic structure-based contrastive learning, which comprehensively aligns multi-granularity medical concepts with abundant image features to adapt to the diverse expressions of pathologies without the reliance on expert image annotations. We conduct primary experiments on a dataset of 220K pairs of image-report chest X-ray images and perform validation across eight external datasets encompassing 34 types of chest pathologies. The results demonstrate that AFLoc outperforms state-of-the-art methods in both annotation-free localization and classification tasks. Additionally, we assess the generalizability of AFLoc on other modalities, including histopathology and retinal fundus images. We show that AFLoc exhibits robust generalization capabilities, even surpassing human benchmarks in localizing five different types of pathological images. These results highlight the potential of AFLoc in reducing annotation requirements and its applicability in complex clinical environments.

Hao Yang, Hong-Yu Zhou, Jiarun Liu, Weijian Huang, Cheng Li, Zhihuan Li, Yuanxu Gao, Qiegen Liu, Yong Liang, Qi Yang, Song Wu, Tao Tan, Hairong Zheng, Kang Zhang, Shanshan Wang• 2024

Related benchmarks

Task	Dataset	Result
Medical Image Classification	COVID	Accuracy87.8	91
Chest X-ray classification	NIH (test)	AUROC83.1	47
Classification	RSNA (test)	F1 Score84.5	44
Classification	RSNA Pneumonia	Accuracy73.52	32
Image Classification	SIIM (test)	F1 Score97.4	30
Image Classification	NIH ChestX-ray	Accuracy83.87	21
Image-Text Retrieval	MIMIC 5x200	Precision@154.37	15
Classification	MIMIC-5 × 200	Accuracy76.2	15
Phrase grounding	MS-CXR	Atelectasis Accuracy0.7941	15
Lesion Segmentation	TBX11K 42	Dice95.06	12

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord