Tree Ensembles for Contextual Bandits
About
We propose a new framework for contextual multi-armed bandits based on tree ensembles. Our framework adapts two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. As part of this framework, we propose a novel method of estimating the uncertainty in tree ensemble predictions. We further demonstrate the effectiveness of our framework via several experimental studies, employing XGBoost and random forests, two popular tree ensemble methods. Compared to state-of-the-art methods based on decision trees and neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Contextual Bandit | MagicTelescope OpenML | Final Cumulative Regret1.59e+3 | 13 | |
| Contextual Bandit | OpenML Adult T=10,000 | Cumulative Regret (Mean)1.56e+3 | 7 | |
| Contextual Bandit | OpenML Mushroom (T=8124) | Mean Cumulative Regret69.1 | 7 | |
| Contextual Bandit | OpenML Shuttle T=10,000 | Cumulative Regret (Mean)170.4 | 7 | |
| Contextual Bandit | EEGEyeState OpenML | Final Cumulative Regret2.13e+3 | 6 | |
| Contextual Bandit | Adult (OpenML) | Final Cumulative Regret1.56e+3 | 6 | |
| Contextual Bandit | Mushroom (OpenML) | Final Regret69.1 | 6 | |
| Contextual Bandit | Shuttle OpenML | Final Cumulative Regret170.4 | 6 | |
| Contextual Bandit | Covertype (OpenML) | Final Cumulative Regret3.54e+3 | 6 | |
| Contextual Bandit | MNIST OpenML | Cumulative Regret4.08e+3 | 6 |