BART: Bayesian additive regression trees

About

We develop a Bayesian "sum-of-trees" model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART's many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.

Hugh A. Chipman, Edward I. George, Robert E. McCulloch• 2008

Related benchmarks

Task	Dataset	Result
Individual Treatment Effect Estimation	IHDP (within-sample)	Sqrt PEHE2.1	59
Individual Treatment Effect Estimation	IHDP (out-of-sample)	sqrt(PEHE)2.3	55
CATE estimation	IHDP 100 train test splits (out-sample)	ERout5.07	53
Classification	Breast	--	44
Classification	Churn	AUROC0.849	33
Policy Error Rate Estimation	HC-MNIST (out-sample)	Error Rate (Out-Sample)17.51	33
Individual Treatment Effect Estimation	Jobs (out-of-sample)	R_pol0.25	32
Regression	Abalone	RMSE2.197	26
Classification	breast-w	ROC-AUC0.977	21
Individual Treatment Effect Estimation	Jobs (within-sample)	R_pol0.23	18

Showing 10 of 45 rows

Other info

Code

Follow for update

@wizwand_team Discord