ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
About
Multi-task learning (MTL) has shown considerable practical benefits, particularly when using language models (LMs). While this is commonly achieved by learning $n$ tasks under a joint optimization procedure, some methods, such as AdapterFusion, divide the problem into two stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits (e.g., promoting reusability). However, current two-stage MTL introduces a substantial number of additional parameters. We address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) and two encoder LMs show that ScaLearn consistently outperforms strong baselines with a small number of transfer parameters (~ $0.35$% of those of AdapterFusion). Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters, achieving competitive results with only $8$ transfer parameters per target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Understanding | SuperGLUE | SGLUE Score75.74 | 84 | |
| General Language Understanding | GLUE v1 (test dev) | MNLI87.06 | 40 | |
| Natural Language Understanding | GLUE and SuperGLUE (test val) | SST-295.7 | 37 | |
| Natural Language Understanding | GLUE RoBERTa LARGE (test dev) | MNLI Accuracy90.31 | 22 | |
| Classification | HumSet XLM-RBASE (test) | Sectors Score72.38 | 17 | |
| Multilingual Multi-label Text Classification | HumSet (test) | Sectors73.32 | 17 | |
| Natural Language Understanding | SuperGLUE RoBERTa-large (test) | ReCoRD88.85 | 17 |