Current browse context:
math.ST
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations
(Submitted on 3 Jul 2020 (v1), revised 9 Nov 2020 (this version, v3), latest version 21 Jun 2022 (v5))
Abstract: Given two samples from possibly different discrete distributions over a common set of size $N$, consider the problem of testing whether these distributions are identical, vs. the following rare/weak perturbation alternative: the frequencies of $N^{1-\beta}$ elements are perturbed by $r(\log N)/2n$ in the Hellinger distance, where $n$ is the size of each sample. We adapt the Higher Criticism (HC) test to this setting using P-values obtained from $N$ exact binomial tests. We characterize the asymptotic performance of the HC-based test in terms of the sparsity parameter $\beta$ and the perturbation intensity parameter $r$. Specifically, we derive a region in the $(\beta,r)$-plane where the test asymptotically has maximal power, while having asymptotically no power outside this region. Our analysis distinguishes between the cases of dense ($N\gg n$) and sparse ($N\ll n$) contingency tables. In the dense case, the phase transition curve matches that of an analogous two-sample normal means model.
Submission history
From: Alon Kipnis [view email][v1] Fri, 3 Jul 2020 22:38:28 GMT (902kb,D)
[v2] Mon, 13 Jul 2020 18:53:38 GMT (903kb,D)
[v3] Mon, 9 Nov 2020 22:52:08 GMT (1808kb,D)
[v4] Fri, 4 Jun 2021 07:41:46 GMT (1793kb,D)
[v5] Tue, 21 Jun 2022 07:01:22 GMT (872kb,D)
Link back to: arXiv, form interface, contact.