References & Citations
Statistics > Methodology
Title: Estimation of the number of spiked eigenvalues in a covariance matrix by bulk eigenvalue matching analysis
(Submitted on 31 May 2020 (this version), latest version 6 Jan 2021 (v2))
Abstract: The spiked covariance model has gained increasing popularity in high-dimensional data analysis. A fundamental problem is determination of the number of spiked eigenvalues, $K$. For estimation of $K$, most attention has focused on the use of $top$ eigenvalues of sample covariance matrix, and there is little investigation into proper ways of utilizing $bulk$ eigenvalues to estimate $K$. We propose a principled approach to incorporating bulk eigenvalues in the estimation of $K$. Our method imposes a working model on the residual covariance matrix, which is assumed to be a diagonal matrix whose entries are drawn from a gamma distribution. Under this model, the bulk eigenvalues are asymptotically close to the quantiles of a fixed parametric distribution. This motivates us to propose a two-step method: the first step uses bulk eigenvalues to estimate parameters of this distribution, and the second step leverages these parameters to assist the estimation of $K$. The resulting estimator $\hat{K}$ aggregates information in a large number of bulk eigenvalues. We show the consistency of $\hat{K}$ under a standard spiked covariance model. We also propose a confidence interval estimate for $K$. Our extensive simulation studies show that the proposed method is robust and outperforms the existing methods in a range of scenarios. We apply the proposed method to analysis of a lung cancer microarray data set and the 1000 Genomes data set.
Submission history
From: Zheng Tracy Ke [view email][v1] Sun, 31 May 2020 04:36:07 GMT (3302kb,D)
[v2] Wed, 6 Jan 2021 03:35:40 GMT (3333kb,D)
Link back to: arXiv, form interface, contact.