DOTA: Distributional Test-Time Adaptation of Vision-Language Models
About
Vision-language foundation models (VLMs), such as CLIP, exhibit remarkable performance across a wide range of tasks. However, deploying these models can be unreliable when significant distribution gaps exist between training and test data, while fine-tuning for diverse scenarios is often costly. Cache-based test-time adapters offer an efficient alternative by storing representative test samples to guide subsequent classifications. Yet, these methods typically employ naive cache management with limited capacity, leading to severe catastrophic forgetting when samples are inevitably dropped during updates. In this paper, we propose DOTA (DistributiOnal Test-time Adaptation), a simple yet effective method addressing this limitation. Crucially, instead of merely memorizing individual test samples, DOTA continuously estimates the underlying distribution of the test data stream. Test-time posterior probabilities are then computed using these dynamically estimated distributions via Bayes' theorem for adaptation. This distribution-centric approach enables the model to continually learn and adapt to the deployment environment. Extensive experiments validate that DOTA significantly mitigates forgetting and achieves state-of-the-art performance compared to existing methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Fine-grained visual classification | FGVC-Aircraft (test) | Top-1 Acc25.59 | 312 | |
| Fine grained classification | EuroSAT | Accuracy47.15 | 81 | |
| Fine-grained Visual Categorization | FGVCAircraft | Accuracy18.06 | 74 | |
| Fine grained classification | UCF101 | Accuracy65.08 | 53 | |
| Few-shot classification | CIFAR FS (test) | -- | 51 | |
| Fine grained classification | Stanford Cars | Accuracy58.72 | 50 | |
| Fine grained classification | Food101 | Top-1 Acc78.61 | 42 | |
| Fine grained classification | SUN397 | Top-1 Accuracy63.89 | 39 | |
| Image Classification | ImageNet A, V, R, S (val) | ImageNet Accuracy70.68 | 38 | |
| Fine grained classification | Oxford Flowers 102 | Accuracy68.53 | 31 |