BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes
About
Fine-tuning the bias terms of large language models (LLMs) has the potential to achieve unprecedented parameter efficiency while maintaining competitive performance, particularly in low-data regimes. However, the link between fine-tuning different bias terms (i.e., $\boldsymbol{b}_q$, $\boldsymbol{b}_k$, and $\boldsymbol{b}_v$ in the query, key, or value projections) and downstream performance remains largely unclear to date. In this paper, we investigate the link between fine-tuning $\boldsymbol{b}_q$, $\boldsymbol{b}_k$, and $\boldsymbol{b}_v$ with the performance of the downstream task. Our key finding is that directly fine-tuning $\boldsymbol{b}_v$ generally leads to higher downstream performance in low-data regimes, in comparison to $\boldsymbol{b}_q$ and $\boldsymbol{b}_k$. We extensively evaluate this unique property across a wide range of LLMs spanning encoder-only and decoder-only architectures up to 6.7B parameters (including bias-free LLMs). Our results provide strong evidence for the effectiveness of directly fine-tuning $\boldsymbol{b}_v$ across various downstream tasks. The implementation code is available at https://github.com/whubaichuan/BEFT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Generation | DROP | F1 Score32.4 | 43 | |
| Classification | GLUE | SST-2 Accuracy95.2 | 14 | |
| Classification | SuperGLUE | CB Accuracy96.4 | 14 | |
| Multiple-Choice | SuperGLUE | COPA Score83 | 14 | |
| Natural Language Inference | RTE low-data regime GLUE | Accuracy58.53 | 4 |