We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: An Information-theoretic Perspective of Hierarchical Clustering

Abstract: A combinatorial cost function for hierarchical clustering was introduced by Dasgupta \cite{dasgupta2016cost}. It has been generalized by Cohen-Addad et al. \cite{cohen2019hierarchical} to a general form named admissible function. In this paper, we investigate hierarchical clustering from the \emph{information-theoretic} perspective and formulate a new objective function. We also establish the relationship between these two perspectives. In algorithmic aspect, we get rid of the traditional top-down and bottom-up frameworks, and propose a new one to stratify the \emph{sparsest} level of a cluster tree recursively in guide with our objective function. For practical use, our resulting cluster tree is not binary. Our algorithm called HCSE outputs a $k$-level cluster tree by a novel and interpretable mechanism to choose $k$ automatically without any hyper-parameter. Our experimental results on synthetic datasets show that HCSE has a great advantage in finding the intrinsic number of hierarchies, and the results on real datasets show that HCSE also achieves competitive costs over the popular algorithms LOUVAIN and HLP.
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2108.06036 [cs.LG]
  (or arXiv:2108.06036v1 [cs.LG] for this version)

Submission history

From: Yicheng Pan [view email]
[v1] Fri, 13 Aug 2021 03:03:56 GMT (1551kb,D)

Link back to: arXiv, form interface, contact.