Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources

About

Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the ``bucket effect'' in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. FlexLoRA allows for dynamic adjustment of local LoRA ranks, fostering the development of a global model imbued with broader, less task-specific knowledge. By synthesizing a full-size LoRA weight from individual client contributions and employing Singular Value Decomposition (SVD) for weight redistribution, FlexLoRA fully leverages heterogeneous client resources. Involving thousands of clients performing heterogeneous NLP tasks and client resources, our experiments validate the efficacy of FlexLoRA, with the federated global model achieving consistently better improvement over SOTA FL methods in downstream NLP task performance across various heterogeneous distributions. FlexLoRA's practicality is further underscored by our theoretical analysis and its seamless integration with existing LoRA-based FL methods, offering a path toward cross-device, privacy-preserving federated tuning for LLMs.

Jiamu Bai, Daoyuan Chen, Bingchen Qian, Liuyi Yao, Yaliang Li• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy (Acc)35.15	352
Commonsense Reasoning	Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)	BoolQ Accuracy64.57	245
Image Classification	DomainNet	Accuracy (ClipArt)88.7	238
Text reconstruction from gradients	Rotten Tomatoes	ROUGE-138.44	68
Commonsense Reasoning	Commonsense Reasoning	BoolQ Accuracy68.37	54
Text Classification	BANKING77 Dir(0.01) (test)	Accuracy69.84	45
Image Classification	CIFAR-100 Dirichlet-0.1 (test)	Accuracy56.23	41
Image Classification	DomainNet (unseen clients)	Average Accuracy84	34
Commonsense Reasoning	Commonsense Reasoning (test)	Overall Average Accuracy77.4	31
Cross-task generalization	Super-NaturalInstructions English Track (unseen clients)	Weighted Avg Rouge-L62.2	27

Showing 10 of 67 rows

Other info

Code

Follow for update

@wizwand_team Discord