Adapting in the Dark: Efficient and Stable Test-Time Adaptation for Black-Box Models
About
Test-Time Adaptation (TTA) for black-box models accessible only via APIs remains a largely unexplored challenge. Existing approaches such as post-hoc output refinement offer limited adaptive capacity, while Zeroth-Order Optimization (ZOO) enables input-space adaptation but faces high query costs and optimization challenges in the unsupervised TTA setting. We introduce BETA (Black-box Efficient Test-time Adaptation), a framework that addresses these limitations by employing a lightweight, local white-box steering model to create a tractable gradient pathway. Through a prediction harmonization technique combined with consistency regularization and prompt learning-oriented filtering, BETA enables stable adaptation with no additional API calls and negligible latency beyond standard inference. On ImageNet-C, BETA achieves a +7.1% accuracy gain on ViT-B/16 and +3.4% on CLIP, surpassing strong white-box and gray-box methods including TENT and TPT. On a commercial API, BETA achieves comparable performance to ZOO at 250x lower cost while maintaining real-time inference speed, establishing it as a practical and efficient solution for real-world black-box TTA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-R (test) | Accuracy76 | 170 | |
| Image Classification | ImageNet-Sketch (test) | -- | 153 | |
| Image Classification | ImageNet-C Severity 5 (test) | Mean Error Rate (Severity 5)62.6 | 132 | |
| Image Classification | ImageNet A, R, S V2 (test) | Accuracy (ImageNet-A)62.8 | 42 | |
| Image Classification | ImageNet-C | Gauss Error59 | 36 | |
| Skin lesion classification | Derm7pt | -- | 15 | |
| Image Classification | EuroSAT | Accuracy (%)53.3 | 5 | |
| Image Classification | ImageNet-C all 15 corruptions severity 5 | Average Accuracy54.7 | 3 |