References & Citations
Mathematics > Statistics Theory
Title: Estimating false inclusion rates in penalized regression models
(Submitted on 19 Jul 2016 (this version), latest version 7 Apr 2017 (v2))
Abstract: Penalized regression methods are an attractive tool for feature selection with many appealing properties, although their widespread adoption has been hampered by the difficulty of applying inferential tools. In particular, the question "How reliable is the selection of those features?" has proved difficult to address, partially due to the complexity of defining a false discovery in the penalized regression setting. Here, I define a false inclusion as a variable that is independent of the outcome regardless of whether other variables are conditioned on. This definition permits straightforward estimation of the number of false inclusions. Theoretical analysis and simulation studies demonstrate that this approach is quite accurate when the correlation among predictors is mild, and slightly conservative when the correlation is moderate. Finally, the practical utility of the proposed method is illustrated using gene expression data from The Cancer Genome Atlas and GWAS data from the Myocardial Applied Genomics Network.
Submission history
From: Patrick Breheny [view email][v1] Tue, 19 Jul 2016 15:37:25 GMT (53kb,D)
[v2] Fri, 7 Apr 2017 14:31:58 GMT (70kb,D)
Link back to: arXiv, form interface, contact.