Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors

About

Test-time adaptation (TTA) fine-tunes pre-trained deep neural networks for unseen test data. The primary challenge of TTA is limited access to the entire test dataset during online updates, causing error accumulation. To mitigate it, TTA methods have utilized the model output's entropy as a confidence metric that aims to determine which samples have a lower likelihood of causing error. Through experimental studies, however, we observed the unreliability of entropy as a confidence metric for TTA under biased scenarios and theoretically revealed that it stems from the neglect of the influence of latent disentangled factors of data on predictions. Building upon these findings, we introduce a novel TTA method named Destroy Your Object (DeYO), which leverages a newly proposed confidence metric named Pseudo-Label Probability Difference (PLPD). PLPD quantifies the influence of the shape of an object on prediction by measuring the difference between predictions before and after applying an object-destructive transformation. DeYO consists of sample selection and sample weighting, which employ entropy and PLPD concurrently. For robust adaptation, DeYO prioritizes samples that dominantly incorporate shape information when making predictions. Our extensive experiments demonstrate the consistent superiority of DeYO over baseline methods across various scenarios, including biased and wild. Project page is publicly available at https://whitesnowdrop.github.io/DeYO/.

Jonghyun Lee, Dahuin Jung, Saehyung Lee, Junsung Park, Juhyeon Shin, Uiwon Hwang, Sungroh Yoon• 2024

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet A	Top-1 Acc54.1	698
Image Classification	ImageNet-R	Top-1 Acc60.3	581
Image Classification	ImageNet-Sketch	Top-1 Accuracy52.2	473
Image Classification	PACS	Overall Average Accuracy76.67	270
Image Classification	ImageNet-R	Accuracy66.1	217
Image Classification	Waterbirds	Average Accuracy87.42	209
Image Classification	CIFAR-10C Severity Level 5 (test)	Average Error Rate (Severity 5)76.65	136
Image Classification	ImageNet-C Severity 5 (test)	Mean Error Rate (Severity 5)26.46	132
Image Classification	PACS	Accuracy75.16	130
Image Classification	ImageNet-C (test)	Defocus Blur Acc56	125

Showing 10 of 57 rows

Other info

Code

Follow for update

@wizwand_team Discord