We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.CO

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Merging $K$-means with hierarchical clustering for identifying general-shaped groups

Abstract: Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and $K$-means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets while $K$-means clustering is efficient but designed to identify homogeneous spherically-shaped clusters. We present a hybrid non-parametric clustering approach that amalgamates the two methods to identify general-shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using $K$-means. We next merge these groups using hierarchical methods with a data-driven distance measure as a stopping criterion. Our proposal has the potential to reveal groups with general shapes and structure in a dataset. We demonstrate good performance on several simulated and real datasets.
Comments: 16 pages, 1 table, 9 figures; accepted for publication in Stat
Subjects: Machine Learning (stat.ML); Computation (stat.CO); Methodology (stat.ME)
DOI: 10.1002/sta4.172
Cite as: arXiv:1712.08786 [stat.ML]
  (or arXiv:1712.08786v1 [stat.ML] for this version)

Submission history

From: Ranjan Maitra [view email]
[v1] Sat, 23 Dec 2017 15:07:00 GMT (2917kb,D)

Link back to: arXiv, form interface, contact.