Consistency-Guided Asynchronous Contrastive Tuning for Few-Shot Class-Incremental Tuning of Foundation Models
About
We propose Consistency-guided Asynchronous Contrastive Tuning (CoACT), a novel method for continuously tuning foundation models to learn new classes in few-shot settings. CoACT consists of three key components:(i) asynchronous contrastive tuning, which learns new classes by including LoRA modules in the pre-trained encoder while enforcing consistency between two asynchronous encoders; (ii) controlled fine-tuning, which facilitates effective tuning of a subset of the foundation model; and (iii) consistency-guided incremental tuning, which enforces additional regularization during later sessions to reduce forgetting of the learned classes. We evaluate our proposed solution on Few-Shot Class-Incremental Learning (FSCIL) as well as a new and more challenging setup called Few-Shot Class-Incremental Tuning (FSCIT), which facilitates the continual tuning of vision foundation models to learn new classes with only a few samples per class. Unlike traditional FSCIL, FSCIT does not require a large in-distribution base session for initial fully supervised training prior to the incremental few-shot sessions. We conduct extensive evaluations across 16 diverse datasets, demonstrating the effectiveness of CoACT in both FSCIL and FSCIT setups. CoACT outperforms existing methods by up to 5.02% in FSCIL and up to 12.51% in FSCIT for individual datasets, with an average improvement of 2.47%. Furthermore, CoACT exhibits reduced forgetting and enhanced robustness in low-shot experiments. Detailed ablation and sensitivity studies highlight the contribution of each component of CoACT. We make our code publicly available at https://github.com/ShuvenduRoy/CoACT-FSCIL.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Few-Shot Class-Incremental Learning | CUB-200 | Session 1 Accuracy86.26 | 75 | |
| Few-Shot Class-Incremental Learning | miniImageNet 60 base classes 5-way 5-shot (incremental) | Session 0 Accuracy97.63 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes, 5-way 5-shot, Session 0 | Accuracy90.46 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes 5-way 5-shot Session 1 | Accuracy88.46 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes 5-way 5-shot Session 2 | Accuracy88.11 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes 5-way 5-shot Session 3 | Accuracy86.94 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes 5-way 5-shot Session 4 | Accuracy86.98 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes 5-way 5-shot Session 5 | Accuracy86.52 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes, 5-way 5-shot, Session 6 | Accuracy86.39 | 20 | |
| Few-Shot Class-Incremental Learning | CIFAR-100 60 base classes 5-way 5-shot Session 7 | Accuracy86 | 20 |