Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation

About

Recent open-vocabulary segmentation methods adopt mask generators to predict segmentation masks and leverage pre-trained vision-language models, e.g., CLIP, to classify these masks via mask pooling. Although these approaches show promising results, it is counterintuitive that accurate masks often fail to yield accurate classification results through pooling CLIP image embeddings within the mask regions. In this paper, we reveal the performance limitations of mask pooling and introduce Mask-Adapter, a simple yet effective method to address these challenges in open-vocabulary segmentation. Compared to directly using proposal masks, our proposed Mask-Adapter extracts semantic activation maps from proposal masks, providing richer contextual information and ensuring alignment between masks and CLIP. Additionally, we propose a mask consistency loss that encourages proposal masks with similar IoUs to obtain similar CLIP embeddings to enhance models' robustness to varying predicted masks. Mask-Adapter integrates seamlessly into open-vocabulary segmentation methods based on mask pooling in a plug-and-play manner, delivering more accurate classification results. Extensive experiments across several zero-shot benchmarks demonstrate significant performance gains for the proposed Mask-Adapter on several well-established methods. Notably, Mask-Adapter also extends effectively to SAM and achieves impressive results on several open-vocabulary segmentation datasets. Code and models are available at https://github.com/hustvl/MaskAdapter.

Yongkang Li, Tianheng Cheng, Bin Feng, Wenyu Liu, Xinggang Wang• 2024

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K A-150
mIoU38.2
188
Semantic segmentationPascal Context 59
mIoU60.4
164
Semantic segmentationADE20K 847
mIoU1.62e+3
83
Semantic segmentationADE20K A-847 (val)
mIoU11.4
70
Open Vocabulary Semantic SegmentationPascal VOC 20
mIoU95.8
62
Open Vocabulary Semantic SegmentationADE-847
mIoU16.2
59
Semantic segmentationPascal Context 459
mIoU22.7
58
Open Vocabulary Semantic SegmentationPascal Context PC-59
mIoU60.4
57
Open Vocabulary Semantic SegmentationADE20K A-150
mIoU38.2
54
Semantic segmentationDv 58-class (val)
ACDC-4159.7
46
Showing 10 of 25 rows

Other info

Code

Follow for update