Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PIDForest: Anomaly Detection via Partial Identification

About

We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum density of data points over all subcubes containing the point. We present PIDForest: a random forest based algorithm that finds anomalies based on this definition. We show that it performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset.

Parikshit Gopalan, Vatsal Sharan, Udi Wieder• 2019

Related benchmarks

TaskDatasetResultRank
Anomaly DetectionShuttle
AUC0.864
39
Anomaly DetectionPageblocks
AUC-ROC0.851
32
Anomaly DetectionFraud
AUC-PR0.186
21
Anomaly DetectionR8
AUC-ROC88.1
10
Anomaly DetectionCOVER
AUC-ROC0.939
10
Anomaly DetectionExploits
AUC-ROC79.7
10
Anomaly DetectionAnalysis
AUC-ROC0.82
10
Anomaly DetectionBackdoor
AUC-ROC0.808
10
Anomaly DetectionDOS
AUC-ROC0.802
10
Showing 9 of 9 rows

Other info

Follow for update