References & Citations
Statistics > Methodology
Title: Probabilistic Best Subset Selection via Gradient-Based Optimization
(Submitted on 11 Jun 2020 (v1), revised 7 Aug 2020 (this version, v3), latest version 1 Jun 2022 (v4))
Abstract: In high-dimensional statistics, variable selection is an optimization problem aiming to recover the latent sparse pattern from all possible covariate combinations. In this paper, we propose a novel optimization method to solve the exact $L_0$-regularized regression problem (a.k.a. best subset selection). We reformulate the optimization problem from a discrete space to a continuous one via probabilistic reparameterization. Within the framework of stochastic gradient descent, we propose a family of unbiased gradient estimators to optimize the $L_0$-regularized objective and a variational lower bound. Within this family, we identify the estimator with a non-vanishing signal-to-noise ratio and uniformly minimum variance. Theoretically, we study the general conditions under which the method is guaranteed to converge to the ground truth in expectation. In a wide variety of synthetic and semi-synthetic data sets, the proposed method outperforms existing variable selection methods that are based on penalized regression and mixed-integer optimization, in both sparse pattern recovery and out-of-sample prediction. Our method can find the true regression model from thousands of covariates in a couple of seconds. a
Submission history
From: Mingzhang Yin [view email][v1] Thu, 11 Jun 2020 13:57:29 GMT (217kb,D)
[v2] Mon, 22 Jun 2020 18:28:46 GMT (217kb,D)
[v3] Fri, 7 Aug 2020 04:23:34 GMT (154kb,D)
[v4] Wed, 1 Jun 2022 01:59:07 GMT (215kb,D)
Link back to: arXiv, form interface, contact.