Current browse context:
stat.CO
Change to browse by:
References & Citations
Statistics > Computation
Title: Bayesian Additive Regression Trees using Bayesian Model Averaging
(Submitted on 1 Jul 2015 (v1), last revised 8 Jul 2015 (this version, v2))
Abstract: Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computationally.
Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, as it is not a statistical model, it cannot produce probabilistic estimates or predictions.
We propose an alternative algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to produce a model which is much more efficient than BART for datasets with large $p$. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data.
We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small $n$ large $p$" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments; one to distinguish between patients with cardiovascular disease and controls and another to classify agressive from non-agressive prostate cancer. We compare our results to their main competitors.
Open source code written in R and Rcpp to run BART-BMA can be found at: this https URL
Submission history
From: Belinda Hernandez [view email][v1] Wed, 1 Jul 2015 10:58:46 GMT (152kb,D)
[v2] Wed, 8 Jul 2015 14:08:27 GMT (152kb,D)
Link back to: arXiv, form interface, contact.