Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Composable Sparse Fine-Tuning for Cross-Lingual Transfer

About

Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be composed with the pretrained model. Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. Most importantly, it outperforms adapters in zero-shot cross-lingual transfer by a large margin in a series of multilingual benchmarks, including Universal Dependencies, MasakhaNER, and AmericasNLI. Based on an in-depth analysis, we additionally find that sparsity is crucial to prevent both 1) interference between the fine-tunings to be composed and 2) overfitting. We release the code and models at https://github.com/cambridgeltl/composable-sft.

Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vuli\'c• 2021

Related benchmarks

TaskDatasetResultRank
Sentiment AnalysisMultiMM CN (test)
F1 Score69.02
24
Sentiment AnalysisMultiMM EN (test)
F1 Score69.93
24
Metaphor DetectionMultiMM EN (test)
F1 Score68.15
24
Metaphor DetectionMultiMM CN (test)
F1 Score66.41
24
Named Entity RecognitionMasakhaNER (test)
F1 Score71.7
19
Named Entity RecognitionPAN-X
Macro Avg Score0.825
16
Dependency ParsingUniversal Dependencies 2.7 (test)
AR DP Score70.8
14
Question AnsweringXQuAD
F1 (ar)73
12
Named Entity RecognitionNER Average over all languages (test)
F1 Score71.7
9
Natural Language InferenceAmericasNLI (test)
Accuracy (aym)58.1
9
Showing 10 of 18 rows

Other info

Code

Follow for update