A Simple Data Mixing Prior for Improving Self-Supervised Learning
About
Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component for advancing recognition models. In this paper, we focus on studying its effectiveness in the self-supervised setting. By noticing the mixed images that share the same source images are intrinsically related to each other, we hereby propose SDMP, short for $\textbf{S}$imple $\textbf{D}$ata $\textbf{M}$ixing $\textbf{P}$rior, to capture this straightforward yet essential prior, and position such mixed images as additional $\textbf{positive pairs}$ to facilitate self-supervised representation learning. Our experiments verify that the proposed SDMP enables data mixing to help a set of self-supervised learning frameworks (e.g., MoCo) achieve better accuracy and out-of-distribution robustness. More notably, our SDMP is the first method that successfully leverages data mixing to improve (rather than hurt) the performance of Vision Transformers in the self-supervised setting. Code is publicly available at https://github.com/OliverRensu/SDMP
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | -- | 3518 | |
| Image Classification | CIFAR-10 (test) | -- | 3381 | |
| Image Classification | ImageNet-1k (val) | Top-1 Acc80 | 706 | |
| Image Classification | ImageNet 1% labeled | -- | 118 | |
| Image Classification | ImageNet (10% labels) | Top-1 Acc68 | 98 | |
| Image Classification | ImageNet-A (val) | Accuracy21.1 | 55 | |
| Linear Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy73.8 | 48 | |
| Linear Classification | ImageNet-1k (val) | Top-1 Accuracy73.5 | 37 | |
| Image Classification | ImageNet-100 small-scale (test) | Top-1 Acc83.2 | 5 | |
| Image Classification | ImageNet-R original (val) | Top-1 Acc45.3 | 4 |