Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

mc-BEiT: Multi-choice Discretization for Image BERT Pre-training

About

Image BERT pre-training with masked image modeling (MIM) becomes a popular practice to cope with self-supervised representation learning. A seminal work, BEiT, casts MIM as a classification task with a visual vocabulary, tokenizing the continuous visual signals into discrete vision tokens using a pre-learned dVAE. Despite a feasible solution, the improper discretization hinders further improvements of image pre-training. Since image discretization has no ground-truth answers, we believe that the masked patch should not be assigned with a unique token id even if a better tokenizer can be obtained. In this work, we introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives. Specifically, the multi-choice supervision for the masked image patches is formed by the soft probability vectors of the discrete token ids, which are predicted by the off-the-shelf image tokenizer and further refined by high-level inter-patch perceptions resorting to the observation that similar patches should share their choices. Extensive experiments on classification, segmentation, and detection tasks demonstrate the superiority of our method, e.g., the pre-trained ViT-B achieves 84.1% top-1 fine-tuning accuracy on ImageNet-1K classification, 49.2% AP^b and 44.0% AP^m of object detection and instance segmentation on COCO, 50.8% mIOU on ADE20K semantic segmentation, outperforming the competitive counterparts. The code will be available at https://github.com/lixiaotong97/mc-BEiT.

Xiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Ling-Yu Duan• 2022

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU50.8
2731
Object DetectionCOCO 2017 (val)--
2454
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy85.6
1866
Instance SegmentationCOCO 2017 (val)
APm0.44
1144
Instance SegmentationCOCO
APmask44.3
279
Object DetectionCOCO
APb51.2
44
Showing 6 of 6 rows

Other info

Code

Follow for update