Your Out-of-Distribution Detection Method is Not Robust!

About

Out-of-distribution (OOD) detection has recently gained substantial attention due to the importance of identifying out-of-domain samples in reliability and safety. Although OOD detection methods have advanced by a great deal, they are still susceptible to adversarial examples, which is a violation of their purpose. To mitigate this issue, several defenses have recently been proposed. Nevertheless, these efforts remained ineffective, as their evaluations are based on either small perturbation sizes, or weak attacks. In this work, we re-examine these defenses against an end-to-end PGD attack on in/out data with larger perturbation sizes, e.g. up to commonly used $\epsilon=8/255$ for the CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than a random detection under the adversarial setting. Next, we aim to provide a robust OOD detection method. In an ideal defense, the training should expose the model to almost all possible adversarial perturbations, which can be achieved through adversarial training. That is, such training perturbations should based on both in- and out-of-distribution samples. Therefore, unlike OOD detection in the standard setting, access to OOD, as well as in-distribution, samples sounds necessary in the adversarial training setup. These tips lead us to adopt generative OOD detection methods, such as OpenGAN, as a baseline. We subsequently propose the Adversarially Trained Discriminator (ATD), which utilizes a pre-trained robust model to extract robust features, and a generator model to create OOD samples. Using ATD with CIFAR-10 and CIFAR-100 as the in-distribution data, we could significantly outperform all previous methods in the robust AUROC while maintaining high standard AUROC and classification accuracy. The code repository is available at https://github.com/rohban-lab/ATD .

Mohammad Azizmalayeri, Arshia Soltani Moakhar, Arman Zarei, Reihaneh Zohrabi, Mohammad Taghi Manzuri, Mohammad Hossein Rohban• 2022

Related benchmarks

Task	Dataset	Result
OOD Detection	CIFAR-100 standard (test)	AUROC (%)87.7	94
Out-of-Distribution Detection	CIFAR-10 (test)	AUROC0.94	52
OOD Detection	CIFAR-10 (test)	Clean AUROC0.943	27
OOD Detection	CIFAR-10 standard (test)	AUROC0.943	25
OOD Detection	TinyImageNet	AUROC (Clean)0.883	17
Adversarial Out-of-Distribution Detection	CIFAR-10 In 1.0 (test)	AUROC0.93	7
Adversarial Out-of-Distribution Detection	CIFAR-10 In and Out 1.0 (test)	AUROC0.928	7
OOD Detection	CIFAR-10	Clean Score0.943	2
OOD Detection	CIFAR-100	Clean Score87.7	2

Showing 9 of 9 rows

Other info

Code

Follow for update

@wizwand_team Discord