Temporal Ensembling for Semi-Supervised Learning
About
In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions. This ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training. Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations. We additionally obtain a clear improvement in CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled extra inputs during training. Finally, we demonstrate good tolerance to incorrect labels.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | CIFAR-100 (test) | Accuracy89.11 | 3518 | |
| Image Classification | CIFAR-10 (test) | Accuracy98.34 | 3381 | |
| Image Classification | CIFAR-100 | -- | 622 | |
| Image Classification | CIFAR10 (test) | -- | 585 | |
| Image Classification | CIFAR-10 | -- | 507 | |
| Image Classification | TinyImageNet (test) | Accuracy89.02 | 366 | |
| Image Classification | SVHN (test) | -- | 362 | |
| Image Classification | ImageNet (test) | Top-1 Accuracy78.46 | 291 | |
| Classification | SVHN (test) | Error Rate2.5 | 182 | |
| Image Classification | Caltech-101 | Top-1 Accuracy83.5 | 146 |