Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
About
Recent pretrained language models extend from millions to billions of parameters. Thus the need to fine-tune an extremely large pretrained model with a limited training corpus arises in various downstream tasks. In this paper, we propose a straightforward yet effective fine-tuning technique, Child-Tuning, which updates a subset of parameters (called child network) of large pretrained models via strategically masking out the gradients of the non-child network during the backward process. Experiments on various downstream tasks in GLUE benchmark show that Child-Tuning consistently outperforms the vanilla fine-tuning by 1.5~8.6 average score among four different pretrained models, and surpasses the prior fine-tuning techniques by 0.6~1.3 points. Furthermore, empirical results on domain transfer and task transfer show that Child-Tuning can obtain better generalization performance by large margins.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | RTE | Accuracy70.87 | 367 | |
| Natural Language Inference | SNLI | Accuracy84.41 | 174 | |
| Natural Language Understanding | GLUE (val) | -- | 170 | |
| Natural Language Inference | MNLI (matched) | Accuracy79.13 | 110 | |
| Natural Language Inference | MNLI | -- | 80 | |
| Question Answering | SQuAD (val) | F1 Score88.5 | 26 | |
| Binary Classification | AdvGLUE (test) | QNLI Accuracy0.496 | 17 | |
| Natural Language Inference | SICK | Accuracy55.69 | 15 | |
| Commonsense Reasoning | SWAG (val) | Accuracy83.7 | 9 | |
| Natural Language Inference | SciTail | Accuracy79.86 | 8 |