Current browse context:
math.ST
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: Bayesian Cluster Enumeration Criterion for Unsupervised Learning
(Submitted on 22 Oct 2017 (v1), last revised 27 Aug 2018 (this version, v3))
Abstract: We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real data sets.
Submission history
From: Freweyni Kidane Teklehaymanot [view email][v1] Sun, 22 Oct 2017 14:59:08 GMT (468kb)
[v2] Fri, 15 Dec 2017 17:31:04 GMT (479kb)
[v3] Mon, 27 Aug 2018 13:53:56 GMT (500kb)
Link back to: arXiv, form interface, contact.