Current browse context:
stat.ME
Change to browse by:
References & Citations
Statistics > Methodology
Title: Machine learning meets false discovery rate
(Submitted on 13 Aug 2022 (v1), last revised 22 Oct 2022 (this version, v2))
Abstract: Classical false discovery rate (FDR) controlling procedures offer strong and interpretable guarantees but often lack flexibility to work with complex data. By contrast, machine learning-based classification algorithms have superior performances on modern datasets but typically fall short of error-controlling guarantees. In this paper, we make these two meet by introducing a new adaptive novelty detection procedure with FDR control, called AdaDetect. It extends the scope of recent works of multiple testing literature to the high dimensional setting, notably the one in Yang et al. (2021). We prove that AdaDetect comes with finite sample guarantees: it controls the FDR strongly and approximates the oracle in terms of the power, with explicit remainder terms that are small under mild conditions. In practice, AdaDetect can be used in combination with any machine learning-based classifier, which allows the user to choose the most relevant classification approach. We illustrate this with classical real-world datasets, for which random forest and neural network classifiers are particularly efficient. The versatility of our method is also shown with an astrophysical application.
Submission history
From: Ariane Marandon [view email][v1] Sat, 13 Aug 2022 17:14:55 GMT (5365kb,D)
[v2] Sat, 22 Oct 2022 08:35:12 GMT (5319kb,D)
Link back to: arXiv, form interface, contact.