Maintaining Discrimination and Fairness in Class Incremental Learning
About
Deep neural networks (DNNs) have been applied in class incremental learning, which aims to solve common real-world problems of learning new classes continually. One drawback of standard DNNs is that they are prone to catastrophic forgetting. Knowledge distillation (KD) is a commonly used technique to alleviate this problem. In this paper, we demonstrate it can indeed help the model to output more discriminative results within old classes. However, it cannot alleviate the problem that the model tends to classify objects into new classes, causing the positive effect of KD to be hidden and limited. We observed that an important factor causing catastrophic forgetting is that the weights in the last fully connected (FC) layer are highly biased in class incremental learning. In this paper, we propose a simple and effective solution motivated by the aforementioned observations to address catastrophic forgetting. Firstly, we utilize KD to maintain the discrimination within old classes. Then, to further maintain the fairness between old classes and new classes, we propose Weight Aligning (WA) that corrects the biased weights in the FC layer after normal training process. Unlike previous work, WA does not require any extra parameters or a validation set in advance, as it utilizes the information provided by the biased weights themselves. The proposed method is evaluated on ImageNet-1000, ImageNet-100, and CIFAR-100 under various settings. Experimental results show that the proposed method can effectively alleviate catastrophic forgetting and significantly outperform state-of-the-art methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Class-incremental learning | CIFAR-100 | Averaged Incremental Accuracy74.09 | 234 | |
| Class-incremental learning | CIFAR100 (test) | Avg Acc69.28 | 76 | |
| Class-incremental learning | CIFAR-100 10 (test) | Average Top-1 Accuracy69.46 | 75 | |
| Class-incremental learning | ImageNet-100 | Avg Acc80.21 | 74 | |
| Class-incremental learning | CIFAR100 B50 (test) | Average Accuracy71.43 | 67 | |
| Continual Learning | CIFAR100 Split 32x32 (test) | Accuracy24 | 66 | |
| Continual Learning | MiniImageNet Split 84x84 (test) | Accuracy18.9 | 66 | |
| Continual Learning | Split CIFAR10 32x32 (test) | Accuracy48.6 | 66 | |
| Class-incremental learning | CIFAR-100 | Average Accuracy70 | 60 | |
| Class-incremental learning | CIFAR100-LT rho=100 (test) | Avg Acc32.07 | 48 |