Debiasing Vision-Language Models via Biased Prompts
About
Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The proposed closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Image Retrieval | Flickr30K | R@179.02 | 460 | |
| Social Debiasing | Fairface Out-of-Domain | MaxSkew (MS)0.094 | 32 | |
| Social Debiasing | FACET Out-of-Domain | MS0.417 | 32 | |
| Zero-shot Image-Text Retrieval | Flickr | R@5 TR99.2 | 32 | |
| Zero-shot Image Classification | ImageNet-1K | Top-1 Accuracy0.7753 | 32 | |
| Holistic Social Debiasing Assessment | Alignment and Bias Level Evaluation (ABLE) | ABLE Score0.8244 | 32 | |
| Social Debiasing | UTKFace In-Domain | MS0.089 | 32 | |
| Multi-class classification | FACET (test) | Accuracy56.37 | 15 | |
| Text-to-Image Generation | Text-to-Image Generation Evaluation Set | Mismatch Rate (M/F)20.11 | 12 | |
| Fair Image Retrieval | CelebA (test) | KL Divergence0.059 | 9 |