Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

About

We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the method is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.

Elad Ben-Zaken, Shauli Ravfogel, Yoav Goldberg• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy81.09
3518
Semantic segmentationADE20K (val)
mIoU48.37
2731
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy82.74
1866
Commonsense ReasoningPIQA
Accuracy76.6
647
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)95.4
504
Image ClassificationStanford Cars
Accuracy79.4
477
Text-to-Image RetrievalFlickr30K
R@167.4
460
Natural Language UnderstandingGLUE
SST-296.1
452
Natural Language UnderstandingGLUE (test)
SST-2 Accuracy95.09
416
Oriented Object DetectionDOTA v1.0 (test)--
378
Showing 10 of 213 rows
...

Other info

Code

Follow for update