DOTA: Distributional Test-Time Adaptation of Vision-Language Models

About

Vision-language foundation models (VLMs), such as CLIP, exhibit remarkable performance across a wide range of tasks. However, deploying these models can be unreliable when significant distribution gaps exist between training and test data, while fine-tuning for diverse scenarios is often costly. Cache-based test-time adapters offer an efficient alternative by storing representative test samples to guide subsequent classifications. Yet, these methods typically employ naive cache management with limited capacity, leading to severe catastrophic forgetting when samples are inevitably dropped during updates. In this paper, we propose DOTA (DistributiOnal Test-time Adaptation), a simple yet effective method addressing this limitation. Crucially, instead of merely memorizing individual test samples, DOTA continuously estimates the underlying distribution of the test data stream. Test-time posterior probabilities are then computed using these dynamically estimated distributions via Bayes' theorem for adaptation. This distribution-centric approach enables the model to continually learn and adapt to the deployment environment. Extensive experiments validate that DOTA significantly mitigates forgetting and achieves state-of-the-art performance compared to existing methods.

Zongbo Han, Jialong Yang, Guangyu Wang, Junfan Li, Qianli Xu, Mike Zheng Shou, Changqing Zhang• 2024

Related benchmarks

Task	Dataset	Result
Fine-grained visual classification	FGVC-Aircraft (test)	Top-1 Acc25.59	312
Fine grained classification	EuroSAT	Accuracy47.15	109
Fine grained classification	UCF101	Accuracy65.08	81
Fine-grained Visual Categorization	FGVCAircraft	Accuracy18.06	74
Fine grained classification	Stanford Cars	Accuracy58.72	74
Image Classification	10 fine-grained recognition datasets (Aircraft, Caltech, Cars, DTD, EuroSAT, Flower, Food101, Pets, SUN397, UCF101) (test)	Aircraft Accuracy26.25	64
Fine grained classification	Food101	Top-1 Acc78.61	52
Few-shot classification	CIFAR FS (test)	--	51
Fine grained classification	Oxford Flowers 102	Accuracy68.53	41
Fine grained classification	SUN397	Top-1 Accuracy63.89	39

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord