Scalable Transfer Learning with Expert Models
About
Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | Food-101 | Accuracy93.1 | 494 | |
| Image Classification | Stanford Cars | Accuracy96.1 | 477 | |
| Image Classification | CIFAR-10 | Accuracy97.9 | 471 | |
| Classification | Cars | Accuracy96.4 | 314 | |
| Image Classification | Aircraft | Accuracy94.8 | 302 | |
| Image Classification | Pets | -- | 204 | |
| Image Classification | FGVC Aircraft | Top-1 Accuracy94.8 | 185 | |
| Image Classification | VTAB 1k (test) | Accuracy (Natural)80.2 | 121 | |
| Image Classification | Food | Accuracy93.1 | 92 | |
| Image Classification | Bird | Accuracy84.3 | 29 |