Current browse context:
math.PR
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: Randomized incomplete $U$-statistics in high dimensions
(Submitted on 3 Dec 2017 (v1), last revised 27 Jan 2019 (this version, v4))
Abstract: This paper studies inference for the mean vector of a high-dimensional $U$-statistic. In the era of Big Data, the dimension $d$ of the $U$-statistic and the sample size $n$ of the observations tend to be both large, and the computation of the $U$-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for $U$-statistics is even more computationally expensive. To overcome such computational bottleneck, incomplete $U$-statistics obtained by sampling fewer terms of the $U$-statistic are attractive alternatives. In this paper, we introduce randomized incomplete $U$-statistics with sparse weights whose computational cost can be made independent of the order of the $U$-statistic. We derive non-asymptotic Gaussian approximation error bounds for the randomized incomplete $U$-statistics in high dimensions, namely in cases where the dimension $d$ is possibly much larger than the sample size $n$, for both non-degenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete $U$-statistics that are computationally much less-demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.
Submission history
From: Xiaohui Chen [view email][v1] Sun, 3 Dec 2017 14:01:42 GMT (5600kb,D)
[v2] Mon, 18 Dec 2017 16:34:38 GMT (9376kb,D)
[v3] Wed, 13 Jun 2018 06:26:12 GMT (133kb,D)
[v4] Sun, 27 Jan 2019 20:11:03 GMT (176kb,D)
Link back to: arXiv, form interface, contact.