Gradient Projection Memory for Continual Learning

About

The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks. We find the bases of these subspaces by analyzing network representations (activations) after learning each task with Singular Value Decomposition (SVD) in a single shot manner and store them in the memory as Gradient Projection Memory (GPM). With qualitative and quantitative analyses, we show that such orthogonal gradient descent induces minimum to no interference with the past tasks, thereby mitigates forgetting. We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches.

Gobinda Saha, Isha Garg, Kaushik Roy• 2021

Related benchmarks

Task	Dataset	Result
Language Understanding	MMLU	Accuracy15.45	844
Reasoning	BBH	--	726
Physical Commonsense Reasoning	PIQA	Accuracy53.48	696
Class-incremental learning	CIFAR-100	Averaged Incremental Accuracy91.31	281
Class-incremental learning	ImageNet-R 5-task	Avg Accuracy (A_bar)81.16	64
Continual Learning	ImageNet-R 10 tasks	Average ACC@1073.34	64
Continual Learning	CIFAR-100 (10-split)	ACC72.48	54
Class-incremental learning	ImageNet-R 20-task	Average Accuracy64.06	52
Continual Image Classification	MiniImageNet Split	Accuracy69.46	42
Exemplar-Free Class-Incremental Learning	CIFAR-100 Big start	Average Incremental Accuracy (Aavg)41.51	39

Showing 10 of 32 rows

Other info

Follow for update

@wizwand_team Discord