We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: A new distance measurement and its application in K-Means Algorithm

Abstract: K-Means clustering algorithm is one of the most commonly used clustering algorithms because of its simplicity and efficiency. K-Means clustering algorithm based on Euclidean distance only pays attention to the linear distance between samples, but ignores the overall distribution structure of the dataset (i.e. the fluid structure of dataset). Since it is difficult to describe the internal structure of two data points by Euclidean distance in high-dimensional data space, we propose a new distance measurement, namely, view-distance, and apply it to the K-Means algorithm. On the classical manifold learning datasets, S-curve and Swiss roll datasets, not only this new distance can cluster the data according to the structure of the data itself, but also the boundaries between categories are neat dividing lines. Moreover, we also tested the classification accuracy and clustering effect of the K-Means algorithm based on view-distance on some real-world datasets. The experimental results show that, on most datasets, the K-Means algorithm based on view-distance has a certain degree of improvement in classification accuracy and clustering effect.
Comments: 12 pages, 6 figures, 3 Tables
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
MSC classes: 62-08
ACM classes: E.1
Cite as: arXiv:2206.05215 [cs.LG]
  (or arXiv:2206.05215v1 [cs.LG] for this version)

Submission history

From: Hou-Biao Li [view email]
[v1] Fri, 10 Jun 2022 16:26:22 GMT (2633kb,D)

Link back to: arXiv, form interface, contact.