Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations

About

The rapid development of Large Language Models (LLMs) has been pivotal in advancing AI, with pre-trained LLMs being adaptable to diverse downstream tasks through fine-tuning. Federated learning (FL) further enhances fine-tuning in a privacy-aware manner by utilizing clients' local data through in-situ computation, eliminating the need for data movement. However, fine-tuning LLMs, given their massive scale of parameters, poses challenges for clients with constrained and heterogeneous resources in FL. Previous methods employed low-rank adaptation (LoRA) for efficient federated fine-tuning but utilized traditional FL aggregation strategies on LoRA adapters. These approaches led to mathematically inaccurate aggregation noise, reducing fine-tuning effectiveness and failing to address heterogeneous LoRAs. In this work, we first highlight the mathematical incorrectness of LoRA aggregation in existing federated fine-tuning methods. We introduce a new approach called FLORA that enables federated fine-tuning on heterogeneous LoRA adapters across clients through a novel stacking-based aggregation method. Our approach is noise-free and seamlessly supports heterogeneous LoRA adapters. Extensive experiments demonstrate FLORA' s superior performance in both homogeneous and heterogeneous settings, surpassing state-of-the-art methods. We envision this work as a milestone for efficient, privacy-preserving, and accurate federated fine-tuning of LLMs. Our code is available at https://github.com/ATP-1010/FederatedLLM.

Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, Ang Li• 2024

Related benchmarks

TaskDatasetResultRank
Multi-hop Question Answering2WikiMultihopQA--
559
Multi-hop Question AnsweringHotpotQA--
294
Math ReasoningGSM8K (test)
Accuracy29.06
250
Commonsense ReasoningCommonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA)
BoolQ Accuracy76.53
223
Question AnsweringPopQA--
98
Multi-turn Conversation EvaluationMT-Bench--
68
Code GenerationHumanEval and MBPP
HumanEval Score13.41
59
Math ReasoningMATH (test)
Accuracy3.86
59
Commonsense ReasoningCommonsense Reasoning
BoolQ Accuracy64.93
29
MMLU EvaluationDolly
Accuracy28.99
24
Showing 10 of 37 rows

Other info

Code

Follow for update