Automating Outlier Detection via Meta-Learning
About
Given an unsupervised outlier detection (OD) task on a new dataset, how can we automatically select a good outlier detection method and its hyperparameter(s) (collectively called a model)? Thus far, model selection for OD has been a "black art"; as any model evaluation is infeasible due to the lack of (i) hold-out data with labels, and (ii) a universal objective function. In this work, we develop the first principled data-driven approach to model selection for OD, called MetaOD, based on meta-learning. MetaOD capitalizes on the past performances of a large body of detection models on existing outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without using any labels. To capture task similarity, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Through comprehensive experiments, we show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors (e.g., LOF and iForest) as well as various state-of-the-art unsupervised meta-learners while being extremely fast. To foster reproducibility and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Outlier Detection | InternetAds ADBench | AUROC69.62 | 31 | |
| Outlier Detection | fault ADBench | AUROC57.21 | 31 | |
| Outlier Detection | optdigits ADBench | AUROC87.22 | 17 | |
| Outlier Detection | SpamBase ADBench | AUROC66.2 | 17 | |
| Outlier Detection | speech (historical) | AUROC54.88 | 17 | |
| Outlier Detection | letter (historical) | AUROC90.09 | 17 | |
| Outlier Detection | campaign (historical) | AUROC76.21 | 17 | |
| Outlier Detection | http (historical) | AUROC0.9995 | 17 | |
| Outlier Detection | vowels (historical) | AUROC94.99 | 17 | |
| Outlier Detection | Pima ADBench | AUROC70.89 | 17 |