ResMLP: Feedforward networks for image classification with data-efficient training
About
We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy81 | 1866 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy81 | 1453 | |
| Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy (%)81 | 1155 | |
| Image Classification | ImageNet-1k (val) | Top-1 Accuracy81 | 840 | |
| Image Classification | ImageNet 1k (test) | Top-1 Accuracy81 | 798 | |
| Image Classification | CIFAR-100 | Top-1 Accuracy89.5 | 622 | |
| Image Classification | ImageNet-1K | Top-1 Acc79.4 | 524 | |
| Image Classification | ImageNet V2 | Top-1 Acc65.5 | 487 | |
| Image Classification | Stanford Cars | -- | 477 | |
| Image Classification | CIFAR-10 | -- | 471 |