Current browse context:
stat.ME
Change to browse by:
References & Citations
Statistics > Methodology
Title: Scalable Bayesian regression in high dimensions with multiple data sources
(Submitted on 2 Oct 2017 (this version), latest version 21 Aug 2019 (v7))
Abstract: We consider high-dimensional regression involving multiple sources of covariates, motivated by biomedical applications with "wide data" and large total dimensionality p. As a starting point, we formulate a flexible ridge-type prior with shrinkage levels that are specific to data type or source. These multiple shrinkage levels are set in a data-driven manner using empirical Bayes. Importantly, all the proposed estimators can be formulated in terms of outer-product data matrices of size n x n, rendering computation fast and scalable in the "wide-data" setting, even for millions of predictors. We extend this approach towards sparse solutions based on constrained minimization of a certain Kullback-Leibler divergence. We consider also a relaxed variant that scales to large numbers of predictors, allows adaptive and source-specific shrinkage and has a closed-form solution. The proposed methods are compared to standard penalized likelihood methods in a simulation study based on biomedical data.
Submission history
From: Konstantinos Perrakis [view email][v1] Mon, 2 Oct 2017 12:00:23 GMT (72kb,D)
[v2] Fri, 10 Nov 2017 14:09:30 GMT (83kb,D)
[v3] Wed, 29 Nov 2017 12:36:54 GMT (83kb,D)
[v4] Thu, 30 Nov 2017 10:23:39 GMT (83kb,D)
[v5] Fri, 19 Jan 2018 14:28:24 GMT (85kb,D)
[v6] Wed, 5 Jun 2019 08:44:31 GMT (389kb,D)
[v7] Wed, 21 Aug 2019 12:00:29 GMT (385kb,D)
Link back to: arXiv, form interface, contact.