We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.CO

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Computation

Title: Bayesian Additive Regression Trees using Bayesian Model Averaging

Abstract: Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computationally.
Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, as it is not a statistical model, it cannot produce probabilistic estimates or predictions.
We propose an alternative algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to produce a model which is much more efficient than BART for datasets with large $p$. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data.
We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small $n$ large $p$" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments; one to distinguish between patients with cardiovascular disease and controls and another to classify agressive from non-agressive prostate cancer. We compare our results to their main competitors.
Open source code written in R and Rcpp to run BART-BMA can be found at: this https URL
Subjects: Computation (stat.CO); Methodology (stat.ME)
Cite as: arXiv:1507.00181 [stat.CO]
  (or arXiv:1507.00181v2 [stat.CO] for this version)

Submission history

From: Belinda Hernandez [view email]
[v1] Wed, 1 Jul 2015 10:58:46 GMT (152kb,D)
[v2] Wed, 8 Jul 2015 14:08:27 GMT (152kb,D)

Link back to: arXiv, form interface, contact.