Continual Test-Time Domain Adaptation
About
Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data. Existing works mainly consider the case where the target domain is static. However, real-world machine perception systems are running in non-stationary and continually changing environments where the target domain distribution can change over time. Existing methods, which are mostly based on self-training and entropy regularization, can suffer from these non-stationary environments. Due to the distribution shift over time in the target domain, pseudo-labels become unreliable. The noisy pseudo-labels can further lead to error accumulation and catastrophic forgetting. To tackle these issues, we propose a continual test-time adaptation approach~(CoTTA) which comprises two parts. Firstly, we propose to reduce the error accumulation by using weight-averaged and augmentation-averaged predictions which are often more accurate. On the other hand, to avoid catastrophic forgetting, we propose to stochastically restore a small part of the neurons to the source pre-trained weights during each iteration to help preserve source knowledge in the long-term. The proposed method enables the long-term adaptation for all parameters in the network. CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models. We demonstrate the effectiveness of our approach on four classification tasks and a segmentation task for continual test-time adaptation, on which we outperform existing methods. Our code is available at \url{https://qin.ee/cotta}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | Cityscapes | mIoU48 | 658 | |
| Image Classification | ImageNet A | Top-1 Acc52.2 | 654 | |
| Image Classification | ImageNet-Sketch | Top-1 Accuracy50 | 407 | |
| Semantic segmentation | ScanNet (val) | -- | 274 | |
| Image Classification | ImageNet-R | Accuracy63.5 | 217 | |
| Image Classification | CIFAR-10C Severity Level 5 (test) | Average Error Rate (Severity 5)16.2 | 127 | |
| 3D Human Pose Estimation | 3DPW | PA-MPJPE50.5 | 127 | |
| Video Semantic Segmentation | VSPW (val) | mIoU49.4 | 121 | |
| Image Classification | ImageNet-R (test) | -- | 118 | |
| Image Classification | ImageNet-C (test) | mCE (Mean Corruption Error)54.8 | 116 |