We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Statistics Theory

Title: Permutation methods for factor analysis and PCA

Abstract: Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because it has a large impact on all downstream data analysis. Consequently, many approaches have been developed to address it. Parallel Analysis is a popular permutation method. It works by randomly scrambling each feature of the data. It selects components if their singular values are larger than those of the permuted data. Despite widespread use in leading textbooks and scientific publications, as well as empirical evidence for its accuracy, it currently has no theoretical justification.
In this paper, we show that the parallel analysis permutation method consistently selects the large components in certain high-dimensional factor models. However, it does not select the smaller components. The intuition is that permutations keep the noise invariant, while "destroying" the low-rank signal. This provides justification for permutation methods in PCA and factor models under some conditions. Our work uncovers drawbacks of permutation methods, and paves the way to improvements.
Comments: To appear in the Annals of Statistics
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
Cite as: arXiv:1710.00479 [math.ST]
  (or arXiv:1710.00479v3 [math.ST] for this version)

Submission history

From: Edgar Dobriban [view email]
[v1] Mon, 2 Oct 2017 04:29:22 GMT (369kb,D)
[v2] Sat, 6 Oct 2018 15:44:39 GMT (723kb,D)
[v3] Fri, 13 Sep 2019 13:25:11 GMT (723kb,D)

Link back to: arXiv, form interface, contact.