Fair Context Learning for Evidence-Balanced Test-Time Adaptation in Vision-Language Models
About
Vision-Language Models (VLMs) such as CLIP enable strong zero-shot recognition but suffer substantial degradation under distribution shifts. Test-Time Adaptation (TTA) aims to improve robustness using only unlabeled test samples, yet most prompt-based TTA methods rely on entropy minimization -- an approach that can amplify spurious correlations and induce overconfident errors when classes share visual features. We propose Fair Context Learning (FCL), an episodic TTA framework that avoids entropy minimization by explicitly addressing shared-evidence bias. Motivated by our additive evidence decomposition assumption, FCL decouples adaptation into (i) augmentation-based exploration to identify plausible class candidates, and (ii) fairness-driven calibration that adapts text contexts to equalize sensitivity to common visual evidence. This fairness constraint mitigates partial feature obsession and enables effective calibration of text embeddings without relying on entropy reduction. Through extensive evaluation, we empirically validate our theoretical motivation and show that FCL achieves competitive adaptation performance relative to state-of-the-art TTA methods across diverse domain-shift and fine-grained benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-R | Top-1 Acc77.78 | 474 | |
| Fine grained classification | Aircraft | Top-1 Acc34.17 | 62 | |
| Fine grained classification | EuroSAT | Accuracy43.69 | 57 | |
| Image Classification | ImageNet A | Accuracy61.38 | 50 | |
| Fine-grained Image Classification | UCF101 | Accuracy68.52 | 34 | |
| Image Classification | ImageNet V | -- | 31 | |
| Fine grained classification | Food101 | -- | 30 | |
| Fine grained classification | SUN397 | Top-1 Accuracy65.71 | 25 | |
| Fine grained classification | Pets | Accuracy86.54 | 22 | |
| Fine-grained Image Classification | Cars | Top-1 Acc68.58 | 20 |