BFTS: Thompson Sampling with Bayesian Additive Regression Trees

About

Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.

Ruizhe Deng, Bibhas Chakraborty, Ran Chen, Yan Shuo Tan• 2026

Related benchmarks

Task	Dataset	Result
Contextual Bandit	MagicTelescope OpenML	Final Cumulative Regret1.49e+3	13
Contextual Bandit	OpenML Adult T=10,000	Cumulative Regret (Mean)1.54e+3	7
Contextual Bandit	OpenML Mushroom (T=8124)	Mean Cumulative Regret53.9	7
Contextual Bandit	OpenML Shuttle T=10,000	Cumulative Regret (Mean)106.9	7
Contextual Bandit	OpenML (Adult, Covertype, EEGEyeState, GasDrift, MagicTelescope, Mushroom, PageBlocks, Shuttle)	Adult Score1.55e+3	7
Contextual Bandits	Synthetic Benchmarks T=10,000, R=5	Friedman Score680.3	7
Contextual Bandit	Adult (OpenML)	Final Cumulative Regret1.54e+3	6
Contextual Bandit	Mushroom (OpenML)	Final Regret53.9	6
Contextual Bandit	Shuttle OpenML	Final Cumulative Regret106.9	6
Contextual Bandit	PageBlocks OpenML	Final Cumulative Regret266.3	6

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord