Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Helgeson, Erika S.; Bair, Eric

Full-text links:

Download:

Ancillary-file links:

Ancillary files (details):

proofs.pdf

Current browse context:

stat.ME

< prev | next >

new | recent | 1610

Statistics > Methodology

Title: Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Authors: Erika S. Helgeson, Eric Bair

(Submitted on 5 Oct 2016 (v1), last revised 6 Oct 2016 (this version, v2))

Abstract: Cluster analysis is an unsupervised learning strategy that can be employed to identify subgroups of observations in data sets of unknown structure. This strategy is particularly useful for analyzing high-dimensional data such as microarray gene expression data. Many clustering methods are available, but it is challenging to determine if the identified clusters represent distinct subgroups. We propose a novel strategy to investigate the significance of identified clusters by comparing the within- cluster sum of squares from the original data to that produced by clustering an appropriate unimodal null distribution. The null distribution we present for this problem uses kernel density estimation and thus does not require that the data follow any particular distribution. We find that our method can accurately test for the presence of clustering even when the number of features is high.

Subjects:	Methodology (stat.ME); Machine Learning (stat.ML)
Cite as:	arXiv:1610.01424 [stat.ME]
	(or arXiv:1610.01424v2 [stat.ME] for this version)

Submission history

From: Erika Helgeson [view email]
[v1] Wed, 5 Oct 2016 14:01:57 GMT (1522kb,AD)
[v2] Thu, 6 Oct 2016 00:18:17 GMT (1747kb,AD)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1610.01424

Download:

Ancillary files (details):

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Methodology

Title: Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Submission history