We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Abstract: A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.
Comments: 42 pages, 11 figures
Subjects: Methodology (stat.ME)
MSC classes: 62H30
Cite as: arXiv:2002.01822 [stat.ME]
  (or arXiv:2002.01822v4 [stat.ME] for this version)

Submission history

From: Christian Hennig [view email]
[v1] Wed, 5 Feb 2020 15:08:19 GMT (1056kb,D)
[v2] Fri, 29 May 2020 15:31:35 GMT (1156kb,D)
[v3] Thu, 4 Jun 2020 16:34:30 GMT (1157kb,D)
[v4] Tue, 23 Jun 2020 16:02:38 GMT (1157kb,D)

Link back to: arXiv, form interface, contact.