Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model
About
Class-incremental learning requires a learning system to continually learn knowledge of new classes and meanwhile try to preserve previously learned knowledge of old classes. As current state-of-the-art methods based on Vision-Language Models (VLMs) still suffer from the issue of differentiating classes across learning tasks. Here a novel VLM-based continual learning framework for image classification is proposed. In this framework, task-specific adapters are added to the pre-trained and frozen image encoder to learn new knowledge, and a novel cross-task representation calibration strategy based on a mixture of light-weight projectors is used to help better separate all learned classes in a unified feature space, alleviating class confusion across tasks. In addition, a novel inference strategy guided by prediction uncertainty is developed to more accurately select the most appropriate image feature for class prediction. Extensive experiments on multiple datasets under various settings demonstrate the superior performance of our method compared to existing ones.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-incremental learning | CIFAR-100 | Average Accuracy90.13 | 60 | |
| Class-incremental learning | ImageNet-R 10-task | -- | 44 | |
| Class-incremental learning | ImageNet-R 20-task | Average Accuracy88.21 | 33 | |
| Class-incremental learning | CIFAR100 10 Tasks | Accuracy89.6 | 29 | |
| Class-incremental learning | ImageNet-R 5-task | Avg Accuracy (A_bar)88.42 | 27 | |
| Class-incremental learning | CIFAR-100 20 tasks | Avg Acc87.11 | 26 | |
| Class-incremental learning | Mini-ImageNet100 5-task setting | Accuracy (Last Task)94.86 | 12 | |
| Class-incremental learning | Mini-ImageNet100 (10-task setting) | Last Accuracy94.38 | 12 |