CLIP model is an Efficient Continual Learner
About
The continual learning setting aims to learn new tasks over time without forgetting the previous ones. The literature reports several significant efforts to tackle this problem with limited or no access to previous task data. Among such efforts, typical solutions offer sophisticated techniques involving memory replay, knowledge distillation, model regularization, and dynamic network expansion. The resulting methods have a retraining cost at each learning task, dedicated memory requirements, and setting-specific design choices. In this work, we show that a frozen CLIP (Contrastive Language-Image Pretraining) model offers astounding continual learning performance without any fine-tuning (zero-shot evaluation). We evaluate CLIP under a variety of settings including class-incremental, domain-incremental and task-agnostic incremental learning on five popular benchmarks (ImageNet-100 & 1K, CORe50, CIFAR-100, and TinyImageNet). Without any bells and whistles, the CLIP model outperforms the state-of-the-art continual learning approaches in the majority of the settings. We show the effect on the CLIP model's performance by varying text inputs with simple prompt templates. To the best of our knowledge, this is the first work to report the CLIP zero-shot performance in a continual setting. We advocate the use of this strong yet embarrassingly simple baseline for future comparisons in the continual learning tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-incremental learning | CIFAR-100 | Averaged Incremental Accuracy78.65 | 234 | |
| Class-incremental learning | ImageNet-R | Average Accuracy84.43 | 103 | |
| Class-incremental learning | ImageNet-100 | Avg Acc83.99 | 74 | |
| Continual Learning | CIFAR-100 | -- | 56 | |
| Image Classification | ImageNet100 (test) | Top-1 Acc75.4 | 41 | |
| Class-incremental learning | CUB200 | Last Accuracy54.8 | 39 | |
| Continual Learning | CIFAR100 (test) | Mean Accuracy72.6 | 31 | |
| Class-incremental learning | VTAB | Avg Accuracy68.5 | 31 | |
| Continual Learning | ImageNet-R (test) | Accuracy76.94 | 20 | |
| Continual Learning | ImageNet-100 (test) | Task 10 Accuracy75.4 | 17 |