Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MEMO: Test Time Robustness via Adaptation and Augmentation

About

While deep neural networks can attain good accuracy on in-distribution test points, many applications require robustness even in the face of unexpected perturbations in the input, changes in the domain, or other sources of distribution shift. We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions, such as access to multiple test points, that prevent widespread adoption. In this work, we aim to study and devise methods that make no assumptions about the model training process and are broadly applicable at test time. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable: when presented with a test example, perform different data augmentations on the data point, and then adapt (all of) the model parameters by minimizing the entropy of the model's average, or marginal, output distribution across the augmentations. Intuitively, this objective encourages the model to make the same prediction across different augmentations, thus enforcing the invariances encoded in these augmentations, while also maintaining confidence in its predictions. In our experiments, we evaluate two baseline ResNet models, two robust ResNet-50 models, and a robust vision transformer model, and we demonstrate that this approach achieves accuracy gains of 1-8\% over standard model evaluation and also generally outperforms prior augmentation and adaptation strategies. For the setting in which only one test point is available, we achieve state-of-the-art results on the ImageNet-C, ImageNet-R, and, among ResNet-50 models, ImageNet-A distribution shift benchmarks.

Marvin Zhang, Sergey Levine, Chelsea Finn• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet A--
553
Image ClassificationPACS
Overall Average Accuracy71.3
230
Image ClassificationImageNet V2 (test)
Top-1 Accuracy78.6
181
Image ClassificationImageNet-A (test)--
154
Image ClassificationImageNet-100 (test)
Clean Accuracy88
109
Image ClassificationCIFAR-10C Severity Level 5 (test)
Average Error Rate (Severity 5)61.81
62
Image ClassificationImageNet-C level 5
Avg Top-1 Acc (ImageNet-C L5)45.8
61
Image ClassificationCIFAR-100-C v1 (test)
Error Rate (Average)32.91
60
Image ClassificationImageNet-C 1.0 (test)--
53
Image ClassificationCIFAR-100C Level 5 (test)
Gaussian Acc20.2
45
Showing 10 of 37 rows

Other info

Follow for update