We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: Hierarchical clustering: visualization, feature importance and model selection

Abstract: We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require the user to generate a single partition of the instances by cutting the dendrogram at a specified level. Our proposed methods, instead, use the full structure of the dendrogram. The key insight behind the proposed methods is to view a dendrogram as a phylogeny. This analogy permits the assignment of a feature value to each internal node of a tree through an evolutionary model. Real and simulated datasets provide evidence that our proposed framework has desirable outcomes and gives more insights than state-of-art approaches. We provide an R package that implements our methods.
Comments: 29 pages, 9 figures, 3 tables
Subjects: Methodology (stat.ME); Machine Learning (cs.LG)
ACM classes: I.5.3
Cite as: arXiv:2112.01372 [stat.ME]
  (or arXiv:2112.01372v2 [stat.ME] for this version)

Submission history

From: Rafael Stern [view email]
[v1] Tue, 30 Nov 2021 20:38:17 GMT (1959kb,D)
[v2] Sat, 28 Jan 2023 03:50:27 GMT (2008kb,D)

Link back to: arXiv, form interface, contact.