BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping

About

Adaptation of pretrained vision-language models such as CLIP to various downstream tasks have raised great interest in recent researches. Previous works have proposed a variety of test-time adaptation (TTA) methods to achieve strong generalization without any knowledge of the target domain. However, existing training-required TTA approaches like TPT necessitate entropy minimization that involves large computational overhead, while training-free methods like TDA overlook the potential for information mining from the test samples themselves. In this paper, we break down the design of existing popular training-required and training-free TTA methods and bridge the gap between them within our framework. Specifically, we maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. The historical samples are filtered from the testing data stream and serve to extract useful information from the target distribution, while the boosting samples are drawn from regional bootstrapping and capture the knowledge of the test sample itself. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets, showcasing its applicability in real-world situations.

Taolin Zhang, Jinpeng Wang, Hang Guo, Tao Dai, Bin Chen, Shu-Tao Xia• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	Cross-domain Benchmark (AIR, CAL, CAR, DTD, EUR, FLWR, FOOD, PETS, SUN, UCF) (test)	AIR Accuracy27.45	80
Image Classification	10 fine-grained recognition datasets (Aircraft, Caltech, Cars, DTD, EuroSAT, Flower, Food101, Pets, SUN397, UCF101) (test)	Aircraft Accuracy27.45	64
Image Classification	ImageNet A, R, S V2 (test)	Accuracy (ImageNet-A)64.53	42
Test-time adaptation	Office-Home	Accuracy80.25	16
Image Classification	Cross-Dataset Benchmark ViT-B/16 backbone (test)	Aircraft Accuracy27.45	13
Image Classification	Cross-Dataset Benchmark ResNet50 backbone (test)	Accuracy (Aircraft)18.93	11
Continual Test-Time Adaptation	PACS	Average Accuracy98.18	10
Continual Test-Time Adaptation	ImageNet-C long-term continual adaptation	Average Accuracy38.14	10
Continual Test-Time Adaptation	CIFAR-10-C	Average Accuracy75.8	10
Classification	OOD	Accuracy65.57	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord