WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

About

Machine learning models encounter Out-of-Distribution (OoD) errors when the data seen at test time are generated from a different stochastic generator than the one used to generate the training data. One proposal to scale OoD detection to high-dimensional data is to learn a tractable likelihood approximation of the training distribution, and use it to reject unlikely inputs. However, likelihood models on natural data are themselves susceptible to OoD errors, and even assign large likelihoods to samples from other datasets. To mitigate this problem, we propose Generative Ensembles, which robustify density-based OoD detection by way of estimating epistemic uncertainty of the likelihood model. We present a puzzling observation in need of an explanation -- although likelihood measures cannot account for the typical set of a distribution, and therefore should not be suitable on their own for OoD detection, WAIC performs surprisingly well in practice.

Hyunsun Choi, Eric Jang, Alexander A. Alemi• 2018

Related benchmarks

Task	Dataset	Result
Out-of-Distribution Detection	CIFAR-10 vs SVHN (test)	AUROC14.3	137
Out-of-Distribution Detection	CIFAR-10 vs CIFAR-100 (test)	AUROC53.2	101
Out-of-Distribution Detection	CIFAR-10 (ID) vs SVHN (OOD) (test)	AUROC100	92
Out-of-Distribution Detection	CIFAR10 (ID) vs SVHN (OOD)	AUROC33.2	81
Out-of-Distribution Detection	CIFAR-10 (ID) vs Celeb-A (OOD)	AUROC99.7	79
Out-of-Distribution Detection	FashionMNIST (ID) vs MNIST (OoD)	AUROC0.766	61
Out-of-Distribution Detection	CELEBA (in-distribution)	AUROC (CIFAR-100)53.2	57
Out-of-Distribution Detection	SVHN (In) CelebA (Out) (test)	AUROC73.6	27
Out-of-Distribution Detection	CIFAR-10 In-Distribution	AUROC (SVHN)0.143	26
Out-of-Distribution Detection	CelebA (In) CIFAR-10 (Out) (test)	AUROC0.609	18

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord