Is Parameter Collision Hindering Continual Learning in LLMs?

About

Large Language Models (LLMs) often suffer from catastrophic forgetting when learning multiple tasks sequentially, making continual learning (CL) essential for their dynamic deployment. Existing state-of-the-art (SOTA) methods, such as O-LoRA, typically focus on constructing orthogonality tasks to decouple parameter interdependence from various domains.In this paper, we reveal that building non-collision parameters is a more critical factor in addressing CL challenges. Our theoretical and experimental analyses demonstrate that non-collision parameters can provide better task orthogonality, which is a sufficient but unnecessary condition. Furthermore, knowledge from multiple domains will be preserved in non-collision parameter subspaces, making it more difficult to forget previously seen data. Leveraging this insight, we propose Non-collision Low-Rank Adaptation (N-LoRA), a simple yet effective approach leveraging low collision rates to enhance CL in LLMs. Experimental results on multiple CL benchmarks indicate that N-LoRA achieves superior performance (+2.9), higher task orthogonality (*4.1 times), and lower parameter collision (*58.1 times) than SOTA methods.

Shuo Yang, Kun-Peng Ning, Yu-Yang Liu, Jia-Yu Yao, Yong-Hong Tian, Yi-Bing Song, Li Yuan• 2024

Related benchmarks

Task	Dataset	Result
Continual Learning	Large Number of Tasks	Average Performance72.4	50
Continual Learning	Standard CL Benchmark	BWT (Avg Order 1-3)78.8	38
Continual Learning	Standard CL Benchmark Order-2	Accuracy77.3	9
Continual Learning Classification	Standard CL benchmark (4 tasks)	Average Accuracy (AA)77.6	9
Continual Learning	Standard CL Benchmark Order-3	Accuracy78.4	3
Continual Learning	Standard CL Benchmark Order-1	Accuracy77.2	3
Continual Learning	Standard CL Benchmark Average	Accuracy77.6	3
Continual Learning	Standard CL Benchmark	FLOPs84.3	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord