Robust and sparse k-means clustering for high-dimensional data

Brodinova, Sarka; Filzmoser, Peter; Ortner, Thomas; Breiteneder, Christian; Zaharieva, Maia

Full-text links:

Download:

Current browse context:

stat.ME

< prev | next >

new | recent | 1709

Statistics > Methodology

Title: Robust and sparse k-means clustering for high-dimensional data

Authors: Sarka Brodinova, Peter Filzmoser, Thomas Ortner, Christian Breiteneder, Maia Zaharieva

(Submitted on 28 Sep 2017)

Abstract: In real-world application scenarios, the identification of groups poses a significant challenge due to possibly occurring outliers and existing noise variables. Therefore, there is a need for a clustering method which is capable of revealing the group structure in data containing both outliers and noise variables without any pre-knowledge. In this paper, we propose a $k$-means-based algorithm incorporating a weighting function which leads to an automatic weight assignment for each observation. In order to cope with noise variables, a lasso-type penalty is used in an objective function adjusted by observation weights. We finally introduce a framework for selecting both the number of clusters and variables based on a modified gap statistic. The conducted experiments on simulated and real-world data demonstrate the advantage of the method to identify groups, outliers, and informative variables simultaneously.

Subjects:	Methodology (stat.ME)
Cite as:	arXiv:1709.10012 [stat.ME]
	(or arXiv:1709.10012v1 [stat.ME] for this version)

Submission history

From: Sarka Brodinova [view email]
[v1] Thu, 28 Sep 2017 15:14:49 GMT (101kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1709.10012

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Methodology

Title: Robust and sparse k-means clustering for high-dimensional data

Submission history