Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

About

We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions through separate task heads. Our model is able to effectively learn to perform classification and segmentation in the absence of pixel-level labels during training, using only image-level labels. To do this it uses attention maps, created from tokens generated by the self-supervised ViT backbone, as pixel-level pseudo-labels. We also explore a practical setup with ``mixed" supervision, where a small number of training images contains ground-truth pixel-level labels and the remaining images have only image-level labels. For this mixed setup, we propose to improve the pseudo-labels using a pseudo-label enhancer that was trained using the available ground-truth pixel-level labels. Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings, and in particular when little-to-no pixel-level labels are available.

Dahyun Kang, Piotr Koniusz, Minsu Cho, Naila Murray• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationCOCO-20i
mIoU (Mean)19.6
132
Few-shot SegmentationPascal-5^i 1-way 1-shot--
71
Few-shot SegmentationCOCO-20
mIoU48.7
22
Few-shot SegmentationPascal-5^i 2-way 1-shot
Score (S=0)35.7
9
Few-shot classificationPascal-5^i 2-way 1-shot
Accuracy (S^0)74.3
8
Few-shot classificationPascal-5^i 1-way 1-shot
Accuracy (S^0)84
8
ClassificationPascal-5^i 1-shot (test)
1-way Acc85.7
5
Few-Shot Classification and SegmentationPascal-5i 1-way 1-shot
Classification 0/1 Exact Ratio (S0)86.9
5
Few-Shot Classification and SegmentationPascal-5i 2-way 1-shot
Classification 0/1 Ratio (S0)70.3
5
Few-shot SegmentationCOCO-20i 1-way 1-shot
mIoU38.3
5
Showing 10 of 15 rows

Other info

Follow for update