Distributional Clustering: A distribution-preserving clustering method

Krishna, Arvind; Mak, Simon; Joseph, Roshan

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1911

Statistics > Machine Learning

Title: Distributional Clustering: A distribution-preserving clustering method

Authors: Arvind Krishna, Simon Mak, Roshan Joseph

(Submitted on 14 Nov 2019)

Abstract: One key use of k-means clustering is to identify cluster prototypes which can serve as representative points for a dataset. However, a drawback of using k-means cluster centers as representative points is that such points distort the distribution of the underlying data. This can be highly disadvantageous in problems where the representative points are subsequently used to gain insights on the data distribution, as these points do not mimic the distribution of the data. To this end, we propose a new clustering method called "distributional clustering", which ensures cluster centers capture the distribution of the underlying data. We first prove the asymptotic convergence of the proposed cluster centers to the data generating distribution, then present an efficient algorithm for computing these cluster centers in practice. Finally, we demonstrate the effectiveness of distributional clustering on synthetic and real datasets.

Comments:	Submitted to Statistica Sinica
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1911.05940 [stat.ML]
	(or arXiv:1911.05940v1 [stat.ML] for this version)

Submission history

From: Arvind Krishna [view email]
[v1] Thu, 14 Nov 2019 05:06:28 GMT (201kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1911.05940

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Statistics > Machine Learning

Title: Distributional Clustering: A distribution-preserving clustering method

Submission history