Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes

About

Fine-tuning the bias terms of large language models (LLMs) has the potential to achieve unprecedented parameter efficiency while maintaining competitive performance, particularly in low-data regimes. However, the link between fine-tuning different bias terms (i.e., $\boldsymbol{b}_q$, $\boldsymbol{b}_k$, and $\boldsymbol{b}_v$ in the query, key, or value projections) and downstream performance remains largely unclear to date. In this paper, we investigate the link between fine-tuning $\boldsymbol{b}_q$, $\boldsymbol{b}_k$, and $\boldsymbol{b}_v$ with the performance of the downstream task. Our key finding is that directly fine-tuning $\boldsymbol{b}_v$ generally leads to higher downstream performance in low-data regimes, in comparison to $\boldsymbol{b}_q$ and $\boldsymbol{b}_k$. We extensively evaluate this unique property across a wide range of LLMs spanning encoder-only and decoder-only architectures up to 6.7B parameters (including bias-free LLMs). Our results provide strong evidence for the effectiveness of directly fine-tuning $\boldsymbol{b}_v$ across various downstream tasks. The implementation code is available at https://github.com/whubaichuan/BEFT.

Baichuan Huang, Ananth Balashankar, Amir Aminifar• 2025

Related benchmarks

TaskDatasetResultRank
GenerationDROP
F1 Score32.4
43
ClassificationGLUE
SST-2 Accuracy95.2
14
ClassificationSuperGLUE
CB Accuracy96.4
14
Multiple-ChoiceSuperGLUE
COPA Score83
14
Natural Language InferenceRTE low-data regime GLUE
Accuracy58.53
4
Showing 5 of 5 rows

Other info

Follow for update