Skip to yearly menu bar Skip to main content


Incorporating Grouping Information into Bayesian Decision Tree Ensembles

JUNLIANG DU · Antonio Linero

Pacific Ballroom #224

Keywords: [ Ensemble Methods ] [ Bayesian Nonparametrics ] [ Bayesian Methods ]

Abstract: We consider the problem of nonparametric regression in the high-dimensional setting in which $P \gg N$. We study the use of overlapping group structures to improve prediction and variable selection. These structures arise commonly when analyzing DNA microarray data, where genes can naturally be grouped according to genetic pathways. We incorporate overlapping group structure into a Bayesian additive regression trees model using a prior constructed so that, if a variable from some group is used to construct a split, this increases the probability that subsequent splits will use predictors from the same group. We refer to our model as an overlapping group Bayesian additive regression trees (OG-BART) model, and our prior on the splits an overlapping group Dirichlet (OG-Dirichlet) prior. Like the sparse group lasso, our prior encourages sparsity both within and between groups. We study the correlation structure of the prior, illustrate the proposed methodology on simulated data, and apply the methodology to gene expression data to learn which genetic pathways are predictive of breast cancer tumor metastasis.

Live content is unavailable. Log in and register to view live content