MixMatch: A Holistic Approach to Semi-Supervised Learning
About
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. For example, on CIFAR-10 with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by a factor of 2 on STL-10. We also demonstrate how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, we perform an ablation study to tease apart which components of MixMatch are most important for its success.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy67.8 | 3518 | |
| Image Classification | CIFAR-10 (test) | Accuracy84.59 | 3381 | |
| Person Re-Identification | Market1501 (test) | Rank-1 Accuracy88.18 | 1264 | |
| Image Classification | CIFAR-10 (test) | -- | 906 | |
| Image Classification | CIFAR-100 | -- | 622 | |
| Image Classification | CIFAR10 (test) | Accuracy93.58 | 585 | |
| Image Classification | CIFAR-10 | Accuracy89 | 507 | |
| Person Re-Identification | MSMT17 (test) | Rank-1 Acc52.99 | 499 | |
| Image Classification | CIFAR-10 | Accuracy91.51 | 471 | |
| Image Classification | CIFAR100 (test) | Top-1 Accuracy71.69 | 377 |