MaxUp: A Simple Way to Improve Generalization of Neural Network Training
About
We propose \emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, \emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test \emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which \emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5\%$ without extra data to $85.8\%$. Code will be released soon.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL39.61 | 1541 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy85.8 | 1453 | |
| Image Classification | ImageNet (val) | Top-1 Acc78.9 | 1206 | |
| Image Classification | CIFAR-10 (test) | Accuracy97.18 | 906 | |
| Image Classification | ImageNet-1k (val) | Top-1 Acc85.8 | 706 | |
| Image Classification | ImageNet ILSVRC-2012 (val) | Top-1 Accuracy78.9 | 405 | |
| Language Modeling | WikiText2 (val) | Perplexity (PPL)41.29 | 277 | |
| Image Classification | ImageNet 2012 (val) | Top-1 Accuracy85.8 | 202 | |
| Image Classification | CIFAR100 (test) | Test Accuracy82.48 | 147 | |
| Language Modeling | Penn Treebank (PTB) (test) | Perplexity50.29 | 120 |