We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Statistics Theory

Title: Computationally efficient sparse clustering

Abstract: We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the simple model $X_i = z_i \theta + \varepsilon_i$, $z_i \in \{-1,1\}$, $\varepsilon_i \thicksim \mathcal{N}(0, I)$, which has two clusters with centres $\theta$ and $-\theta$.
We provide a finite sample analysis of a new sparse clustering algorithm based on sparse PCA and show that it achieves the minimax optimal misclustering rate in the regime $\|\theta\| \rightarrow \infty$, matching asymptotically the Bayes error.
Our results require the sparsity to grow slower than the square root of the sample size. Using a recent framework for computational lower bounds---the low-degree likelihood ratio---we give evidence that this condition is necessary for any polynomial-time clustering algorithm to succeed below the BBP threshold. This complements existing evidence based on reductions and statistical query lower bounds. Compared to these existing results, we cover a wider set of parameter regimes and give a more precise understanding of the runtime required and the misclustering error achievable.
We also discuss extensions of our results to more than two clusters.
Comments: 26 pages
Subjects: Statistics Theory (math.ST); Computational Complexity (cs.CC); Machine Learning (cs.LG); Machine Learning (stat.ML)
MSC classes: 62H30
Cite as: arXiv:2005.10817 [math.ST]
  (or arXiv:2005.10817v2 [math.ST] for this version)

Submission history

From: Alexander Wein [view email]
[v1] Thu, 21 May 2020 17:51:30 GMT (73kb)
[v2] Mon, 25 May 2020 17:21:09 GMT (73kb)

Link back to: arXiv, form interface, contact.