Bayesian additive regression trees (BART) are a Bayesian machine learning tool for nonparametric function estimation, which has been shown to have outstanding performance in terms of variable selection and prediction accuracy. Unmodified, however, it has limitations when applied to certain tasks. For example, for the problem of interaction detection, it doesn’t punish the number of low-order interaction. Instead, it prefers to include extra interactions to gain slightly better prediction performance, which creates unwanted false-positives. In Chapter 2 and Chapter 3, we propose two extensions to the original BART model to improve its performance on variable selection tasks. In Chapter 1, we provide a brief review of the Bayesian Additive Regression Trees (BART) framework. Then we show how to incorporate the sparsity assumption into the BART framework and how it aides the task of prediction and variable selection. Next, related to our proposed solution of interaction detection problems, we introduce what interaction detection problems are, and what techniques we currently have to handle the task. Finally, we state the motivation to solve grouped variable selection problem, or in other words, why it’s worth the effort to develop a method especially for grouped variable. Also, as before, we briefly review the options we currently have to perform grouped variable selection. In Chapter 2, we introduce the Dirichlet Process-Forest (DP-Forest) prior to address the issue of selecting too many interactions. The basic idea behind the proposed prior is that we cluster the trees in the ensembles such that each cluster of the trees focus on learning a specific interaction structure. By doing this, the proposed model reduces the number of false positives in the task of interaction detection significantly while maintaining a similar level of prediction accuracy and main effect selection performance. We illustrate the point with several simulation studies. In Chapter 3, we are motivated by genomic data where the number of predictors is far larger that the number of observations. However, in genomic settings the predictors in the dataset are often associated to some known graphical structures, e.g., pathway information. We refer to this as grouping information since it groups the predictors corresponding to the different pathways. We introduce overlapping group BART (OG-BART), which extends BART by utilizing the grouping information. One of the key advantages of our model is that it can be used even when the covariates are in overlapping groups, e.g., a particular gene is associated to more than more pathway. Another contribution is that we study the correlation structure of the OG-Dirichlet prior. The OG-BART prior is constructed to allow for positive correlation among the BART splitting probabilities, that is, if a predictor is included in the model, this increases the chance that the predictors in the same group as this predictor to be included in the model. We apply our model to two different simulation setting, one with non-overlapping groups and one with overlapping groups. In both simulation studies, our method outperforms the competing methods. And the simulation studies also show a need to utilize the grouping information to gain extra performance. We close this dissertation in Chapter 4 with a discussion, and possible areas for future work.