Merge before Forget: A Single LoRA Continual Learning via Continual Merging

About

Parameter-efficient continual learning has emerged as a promising approach for large language models (LLMs) to mitigate catastrophic forgetting while enabling adaptation to new tasks. Current Low-Rank Adaptation (LoRA) continual learning techniques often retain and freeze previously learned LoRAs or generate data representations to overcome forgetting, typically utilizing these to support new LoRAs learn new tasks. However, these methods not only ignore growing computational memory with tasks and limited storage space but also suffer from potential task interference due to the lack of effective LoRA merging mechanisms. In this paper, we propose a novel continual learning method that orthogonally initializes and sequentially merges LoRAs updates into a single unified LoRA. Our method leverages orthogonal basis extraction from previously learned LoRA to initialize the learning of new tasks, further exploits the intrinsic asymmetry property of LoRA components by using a time-aware scaling mechanism to balance new and old knowledge during continual merging. Our approach maintains constant memory complexity with respect to the number of tasks, minimizes interference between past and new tasks via orthogonal basis initialization, and improves performance over asymmetric LoRA merging via adaptive scaling. We provide theoretical analysis to justify our design and conduct extensive experiments across diverse continual learning benchmarks using various Llama models, demonstrating the effectiveness and efficiency of our method.

Fuli Qiao, Mehrdad Mahdavi• 2025

Related benchmarks

Task	Dataset	Result
Continual Learning	Standard CL Benchmark	Avg Final Acc0.804	71
Continual Learning	Large Number of Tasks	Average Performance74.8	50
Continual Learning	SuperNI Benchmark	Average Score37.2	14
Continual Learning	Large Number of Tasks (test)	Backward Transfer (BWT)-3.5	13
Continual Learning	SuperNI Standard CL Benchmark (test)	Average Performance81	13
Continual Learning	SuperNI Large Number of Tasks (test)	Average Performance76.2	13
Continual Learning Classification	Standard CL benchmark (4 tasks)	Average Accuracy (AA)80.4	9
Continual Learning	SuperNI	--	9
Continual Learning	Large Number of Tasks	MOPD15.17	4
Continual Learning	Standard CL Benchmark	MOPD1.72	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord