RDumb: A simple approach that questions our progress in continual test-time adaptation

About

Test-Time Adaptation (TTA) allows to update pre-trained models to changing data distributions at deployment time. While early work tested these algorithms for individual fixed distribution shifts, recent work proposed and applied methods for continual adaptation over long timescales. To examine the reported progress in the field, we propose the Continually Changing Corruptions (CCC) benchmark to measure asymptotic performance of TTA techniques. We find that eventually all but one state-of-the-art methods collapse and perform worse than a non-adapting model, including models specifically proposed to be robust to performance collapse. In addition, we introduce a simple baseline, "RDumb", that periodically resets the model to its pretrained state. RDumb performs better or on par with the previously proposed state-of-the-art in all considered benchmarks. Our results show that previous TTA approaches are neither effective at regularizing adaptation to avoid collapse nor able to outperform a simplistic resetting strategy.

Ori Press, Steffen Schneider, Matthias K\"ummerer, Matthias Bethge• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet-C Severity 5 (test)	Mean Error Rate (Severity 5)72.2	216
Image Classification	CIFAR-10C Severity Level 5 (test)	Average Error Rate (Severity 5)31.1	136
Image Classification	CINIC-10 iid (test)	Test Accuracy48.14	34
Continual Test-Time Adaptation	CCC Hard	Error (%)76.5	32
Continual Test-Time Adaptation	CCC Easy	Error (%)36.8	32
Continual Test-Time Adaptation	CCC Medium	Error (%)43.06	32
Image Classification	CIFAR-100-C Severity 5	mCE36.7	26
Image Classification	CCC	Accuracy (Easy)49.3	18
Image Classification	CCC Hard	Accuracy23.5	16
Image Classification	CCC Easy	Accuracy63.2	16

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord