Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

About

The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of screening mammogram-report pairs, addressing the challenges of dataset diversity and size. Our experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes crucial for breast cancer detection, showcasing data efficiency and robustness similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature attribution method, to provide spatial interpretation of representation with sentence-level granularity within mammography reports. Code is available publicly: \url{https://github.com/batmanlab/Mammo-CLIP}.

Shantanu Ghosh, Clare B. Poynton, Shyam Visweswaran, Kayhan Batmanghelich• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	RSNA (test)	AUC91	59
Pathology Image Classification	BreakHis (test)	Top-1 Accuracy86.97	46
Mammography Classification	VinDr	ROC AUC0.858	35
Calcification Classification	VinDr	AUC0.98	25
Density Classification	VinDr	Accuracy88	25
Mass Classification	VinDr	AUC0.88	25
Medical Image Classification	BUSI (test)	Accuracy85.62	23
Calcification Localization	VinDr held-out (test)	mAP35	20
Mass Localization	VinDr held-out (test)	mAP0.58	20
Medical Imaging Classification	Shenzhen public (test)	Accuracy88.64	9

Showing 10 of 12 rows

Other info

Code

Follow for update

@wizwand_team Discord