We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Statistics Theory

Title: Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Abstract: We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhibit a sharp information-theoretic threshold, below which the signal is too weak for any algorithm to detect. We derive upper and lower bounds on these thresholds by applying the first and second moment methods to the likelihood ratio between these "planted models" and null models where the signal matrix is zero. Our bounds differ by at most a factor of root two when the rank is large (in the clustering and submatrix localization problems, when the number of clusters or blocks is large) or the signal matrix is very sparse. Moreover, our upper bounds show that for each of these problems there is a significant regime where reliable detection is information- theoretically possible but where known algorithms such as PCA fail completely, since the spectrum of the observed matrix is uninformative. This regime is analogous to the conjectured 'hard but detectable' regime for community detection in sparse graphs.
Comments: For sparse PCA and submatrix localization, we determine the information-theoretic threshold exactly in the limit where the number of blocks is large or the signal matrix is very sparse based on a conditional second moment method, closing the factor of root two gap in the first version
Subjects: Statistics Theory (math.ST); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Probability (math.PR)
Cite as: arXiv:1607.05222 [math.ST]
  (or arXiv:1607.05222v2 [math.ST] for this version)

Submission history

From: Jess Banks [view email]
[v1] Mon, 18 Jul 2016 18:09:53 GMT (56kb,D)
[v2] Mon, 23 Jan 2017 06:20:25 GMT (74kb,D)

Link back to: arXiv, form interface, contact.