We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DM

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Discrete Mathematics

Title: Systematic Analysis of Cluster Similarity Indices: Towards Bias-free Cluster Validation

Abstract: There are many cluster similarity indices used to evaluate clustering algorithms and choosing the best one for a particular task is usually an open problem. In this paper, we perform a thorough analysis of this problem: we develop a list of desirable properties (requirements) and theoretically verify which indices satisfy them. In particular, we investigate dozens of pair-counting indices and prove that none of them satisfies all the requirements. Based on our analysis, we propose using the arccosine of the correlation coefficient as a similarity measure and prove that it satisfies almost all requirements (except for one, which is still satisfied assymptotically). This new measure can be thought of as an angle between partitions.
Subjects: Discrete Mathematics (cs.DM); Probability (math.PR)
Cite as: arXiv:1911.04773 [cs.DM]
  (or arXiv:1911.04773v1 [cs.DM] for this version)

Submission history

From: Liudmila Ostroumova Prokhorenkova [view email]
[v1] Tue, 12 Nov 2019 10:25:47 GMT (58kb)

Link back to: arXiv, form interface, contact.