We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: PIDForest: Anomaly Detection via Partial Identification

Abstract: We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum density of data points over all subcubes containing the point. We present PIDForest: a random forest based algorithm that finds anomalies based on this definition. We show that it performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset.
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:1912.03582 [cs.LG]
  (or arXiv:1912.03582v1 [cs.LG] for this version)

Submission history

From: Vatsal Sharan [view email]
[v1] Sun, 8 Dec 2019 00:43:42 GMT (225kb,D)

Link back to: arXiv, form interface, contact.