Current browse context:
math.ST
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: Permutation methods for factor analysis and PCA
(Submitted on 2 Oct 2017 (v1), last revised 13 Sep 2019 (this version, v3))
Abstract: Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because it has a large impact on all downstream data analysis. Consequently, many approaches have been developed to address it. Parallel Analysis is a popular permutation method. It works by randomly scrambling each feature of the data. It selects components if their singular values are larger than those of the permuted data. Despite widespread use in leading textbooks and scientific publications, as well as empirical evidence for its accuracy, it currently has no theoretical justification.
In this paper, we show that the parallel analysis permutation method consistently selects the large components in certain high-dimensional factor models. However, it does not select the smaller components. The intuition is that permutations keep the noise invariant, while "destroying" the low-rank signal. This provides justification for permutation methods in PCA and factor models under some conditions. Our work uncovers drawbacks of permutation methods, and paves the way to improvements.
Submission history
From: Edgar Dobriban [view email][v1] Mon, 2 Oct 2017 04:29:22 GMT (369kb,D)
[v2] Sat, 6 Oct 2018 15:44:39 GMT (723kb,D)
[v3] Fri, 13 Sep 2019 13:25:11 GMT (723kb,D)
Link back to: arXiv, form interface, contact.