CLIP model is an Efficient Continual Learner

About

The continual learning setting aims to learn new tasks over time without forgetting the previous ones. The literature reports several significant efforts to tackle this problem with limited or no access to previous task data. Among such efforts, typical solutions offer sophisticated techniques involving memory replay, knowledge distillation, model regularization, and dynamic network expansion. The resulting methods have a retraining cost at each learning task, dedicated memory requirements, and setting-specific design choices. In this work, we show that a frozen CLIP (Contrastive Language-Image Pretraining) model offers astounding continual learning performance without any fine-tuning (zero-shot evaluation). We evaluate CLIP under a variety of settings including class-incremental, domain-incremental and task-agnostic incremental learning on five popular benchmarks (ImageNet-100 & 1K, CORe50, CIFAR-100, and TinyImageNet). Without any bells and whistles, the CLIP model outperforms the state-of-the-art continual learning approaches in the majority of the settings. We show the effect on the CLIP model's performance by varying text inputs with simple prompt templates. To the best of our knowledge, this is the first work to report the CLIP zero-shot performance in a continual setting. We advocate the use of this strong yet embarrassingly simple baseline for future comparisons in the continual learning tasks.

Vishal Thengane, Salman Khan, Munawar Hayat, Fahad Khan• 2022

Related benchmarks

Task	Dataset	Result
Class-incremental learning	CIFAR-100	Averaged Incremental Accuracy78.65	281
Class-incremental learning	ImageNet-R	Last Accuracy76.94	147
Class-incremental learning	ImageNet-100	Last Acc75.4	108
Image Classification	ImageNet100 (test)	Top-1 Acc75.4	87
Continual Learning	CIFAR100 (test)	Mean Accuracy72.6	69
Class-incremental learning	CUB200	Last Accuracy54.8	64
Class-incremental learning	CUB200 10 Tasks	--	59
Continual Learning	CIFAR-100	--	56
Class-incremental learning	VTAB	Avg Accuracy68.5	55
Class-incremental learning	ImageNet-R 10-task	--	54

Showing 10 of 59 rows

Other info

Follow for update

@wizwand_team Discord