We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice?

Abstract: Given an unsupervised outlier detection task, how should one select a detection algorithm as well as its hyperparameters (jointly called a model)? Unsupervised model selection is notoriously difficult, in the absence of hold-out validation data with ground-truth labels. Therefore, the problem is vastly understudied. In this work, we study the feasibility of employing internal model evaluation strategies for selecting a model for outlier detection. These so-called internal strategies solely rely on the input data (without labels) and the output (outlier scores) of the candidate models. We setup (and open-source) a large testbed with 39 detection tasks and 297 candidate models comprised of 8 detectors and various hyperparameter configurations. We evaluate 7 different strategies on their ability to discriminate between models w.r.t. detection performance, without using any labels. Our study reveals room for progress -- we find that none would be practically useful, as they select models only comparable to a state-of-the-art detector (with random configuration).
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2104.01422 [cs.LG]
  (or arXiv:2104.01422v2 [cs.LG] for this version)

Submission history

From: Martin Q. Ma [view email]
[v1] Sat, 3 Apr 2021 14:56:29 GMT (426kb,D)
[v2] Mon, 12 Apr 2021 19:24:44 GMT (426kb,D)

Link back to: arXiv, form interface, contact.