BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

About

We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the method is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.

Elad Ben-Zaken, Shauli Ravfogel, Yoav Goldberg• 2021

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-100 (test)	Accuracy81.09	3518
Semantic segmentation	ADE20K (val)	mIoU48.37	3069
Object Detection	COCO 2017 (val)	AP51.2	2843
Image Classification	ImageNet-1K 1.0 (val)	Top-1 Accuracy82.74	2238
Commonsense Reasoning	PIQA	Accuracy76.6	757
Image Classification	Stanford Cars	Accuracy79.4	660
Text-to-Image Retrieval	Flickr30K	R@167.4	559
Natural Language Understanding	GLUE	SST-296.1	551
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)95.4	529
Vision-and-Language Navigation	R2R (val unseen)	Success Rate (SR)59.17	448

Showing 10 of 270 rows

...

Other info

Code

Follow for update

@wizwand_team Discord