Task Adaptive Parameter Sharing for Multi-Task Learning
About
Adapting pre-trained models with broad capabilities has become standard practice for learning a wide range of downstream tasks. The typical approach of fine-tuning different models for each task is performant, but incurs a substantial memory cost. To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. This enables multi-task learning while minimizing resources used and competition between tasks. TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights. Further, a sparsity penalty on the number of active layers encourages weight sharing with the base model. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. Moreover, TAPS is agnostic to the model architecture and requires only minor changes to the training scheme. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | Cars | Accuracy89.76 | 314 | |
| Image Classification | CUB | Accuracy82.65 | 249 | |
| Image Classification | Flowers | Accuracy96.68 | 127 | |
| Image Classification | Visual Decathlon Challenge 1.0 (test) | Mean Accuracy78.7 | 81 | |
| Image Classification | Sketch | -- | 20 | |
| Incremental Multi-Task Learning | DomainNet | Accuracy (Real)80.28 | 4 | |
| Joint Multi-Task Learning | DomainNet | Real Accuracy78.91 | 3 |