Current browse context:
stat.ME
Change to browse by:
References & Citations
Statistics > Methodology
Title: Testing for Outliers with Conformal p-values
(Submitted on 16 Apr 2021 (v1), last revised 25 May 2022 (this version, v3))
Abstract: This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.
Submission history
From: Matteo Sesia [view email][v1] Fri, 16 Apr 2021 17:59:21 GMT (4337kb,D)
[v2] Mon, 19 Apr 2021 16:31:16 GMT (4339kb,D)
[v3] Wed, 25 May 2022 02:35:07 GMT (5063kb,D)
Link back to: arXiv, form interface, contact.