Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Robust Fairness Vision-Language Learning for Medical Image Analysis

About

The advent of Vision-Language Models (VLMs) in medical image analysis has the potential to help process multimodal inputs and increase performance over traditional inference methods. However, when considering the domain in which these models will be implemented, fairness and robustness are important to ensure the model stays true for any patient. In this paper, we introduce a framework for ensuring robustness and fairness of VLM models. This framework modifies the loss function at training by identifying and adjusting faulty image-text pairs through a Dynamic Bad Pair Mining algorithm and also utilizing Sinkhorn distance to ensure the loss distributions of protected groups do not deviate from the total loss. Experimental testing of our framework shows up to a 8.6\% improvement when looking at equity-scaled AUC.

Sparsh Bansal, Mingyang Wu, Xin Wang, Shu Hu• 2025

Related benchmarks

TaskDatasetResultRank
Medical Image ClassificationHarvard-FairVLMed Linear Probing (test)
Overall F1 Score68.39
24
Glaucoma DetectionHarvard-FairVLMed
DPD5.67
12
Glaucoma DetectionFairFundus Age Partition (5 Fold Cross-Validation)
DPD2.93
6
Medical Image ClassificationFairFundus
F1 Score69.86
6
Vision-Language Medical ClassificationHarvard-FairVLMed Ethnicity (test)
F1 Score63.76
6
Glaucoma DetectionFairFundus Gender Partition (5 Fold Cross-Validation)
DPD2.5
6
Vision-Language Medical ClassificationHarvard-FairVLMed Language (test)
F1 Score (Overall)62.54
6
Vision-Language Medical ClassificationHarvard-FairVLMed Race (test)
F163.08
6
Vision-Language Medical ClassificationHarvard-FairVLMed Gender (test)
F162.48
6
Showing 9 of 9 rows

Other info

Follow for update