Current browse context:
cs.DM
Change to browse by:
References & Citations
Computer Science > Discrete Mathematics
Title: Systematic Analysis of Cluster Similarity Indices: Towards Bias-free Cluster Validation
(Submitted on 12 Nov 2019 (this version), latest version 26 Aug 2021 (v7))
Abstract: There are many cluster similarity indices used to evaluate clustering algorithms and choosing the best one for a particular task is usually an open problem. In this paper, we perform a thorough analysis of this problem: we develop a list of desirable properties (requirements) and theoretically verify which indices satisfy them. In particular, we investigate dozens of pair-counting indices and prove that none of them satisfies all the requirements. Based on our analysis, we propose using the arccosine of the correlation coefficient as a similarity measure and prove that it satisfies almost all requirements (except for one, which is still satisfied assymptotically). This new measure can be thought of as an angle between partitions.
Submission history
From: Liudmila Ostroumova Prokhorenkova [view email][v1] Tue, 12 Nov 2019 10:25:47 GMT (58kb)
[v2] Fri, 6 Mar 2020 14:01:30 GMT (51kb)
[v3] Thu, 14 May 2020 06:29:00 GMT (51kb)
[v4] Fri, 3 Jul 2020 08:22:05 GMT (75kb,D)
[v5] Mon, 25 Jan 2021 20:24:14 GMT (1093kb,D)
[v6] Wed, 30 Jun 2021 15:44:59 GMT (168kb,D)
[v7] Thu, 26 Aug 2021 10:58:32 GMT (163kb,D)
Link back to: arXiv, form interface, contact.