We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Statistics Theory

Title: Bayesian Cluster Enumeration Criterion for Unsupervised Learning

Abstract: We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real data sets.
Comments: 14 pages, 7 figures
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
DOI: 10.1109/TSP.2018.2866385
Cite as: arXiv:1710.07954 [math.ST]
  (or arXiv:1710.07954v3 [math.ST] for this version)

Submission history

From: Freweyni Kidane Teklehaymanot [view email]
[v1] Sun, 22 Oct 2017 14:59:08 GMT (468kb)
[v2] Fri, 15 Dec 2017 17:31:04 GMT (479kb)
[v3] Mon, 27 Aug 2018 13:53:56 GMT (500kb)

Link back to: arXiv, form interface, contact.