Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

About

We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP's text embeddings that prioritize one specific tag in image-text relationships. When deconstructing text into individual tags, only one tag tends to have high relevancy with CLIP's image embedding, leading to biased tag relevancy. In this paper, we introduce a novel two-step fine-tuning approach, Text-Tag Self-Distillation (TTD), to address this challenge. TTD first extracts image-relevant tags from text based on their similarity to the nearest pixels then employs a self-distillation strategy to align combined masks with the text-derived mask. This approach ensures the unbiased image-text alignment of the CLIP-based models using only image-text pairs without necessitating additional supervision. Our technique demonstrates model-agnostic improvements in multi-tag classification and segmentation tasks, surpassing competing methods that rely on external resources. The code is available at https://github.com/shjo-april/TTD.

Sanghyun Jo, Soohyun Ryu, Sungyub Kim, Eunho Yang, Kyungsu Kim• 2024

Related benchmarks

TaskDatasetResultRank
Referring Expression SegmentationRefCOCO (testA)--
217
Referring Expression SegmentationRefCOCO+ (val)--
201
Referring Expression SegmentationRefCOCO (testB)--
191
Referring Expression SegmentationRefCOCO (val)--
190
Referring Expression SegmentationRefCOCO+ (testA)--
190
Referring Expression SegmentationRefCOCO+ (testB)--
188
Multi-Label ClassificationNUS-WIDE (test)
mAP42.63
112
Referring Expression SegmentationRefCOCOg (val)--
107
Open Vocabulary Semantic SegmentationPASCAL Context Context60 with background
mIoU37.4
28
Open Vocabulary Semantic SegmentationADE20K without background
mIoU17
28
Showing 10 of 17 rows

Other info

Code

Follow for update